Viewing a single comment thread. View all comments

YamEnvironmental4720 t1_itpvllg wrote

You may want to take a look at the Random Forest algorithm, for instance one of the introductory lectures by Nando de Freitas on YouTube on this topic. The key word is entropy, and the idea is to study how this changes when you look at all sample points with some variable value below and above some threshold value, respectively. You do this for all the variables and for each variable you also test different threshold values.

1

ash-050 t1_ittne1q wrote

Thank you so much u/YamEnvironmental4720 for your reply. Would I be having the same results if I used the trained model's feature importance ?

1

YamEnvironmental4720 t1_ituam06 wrote

It depends on how you define importance. Entropy could be one such definition but even in forest classifiers there are alternatives to entropy.

1

ash-050 t1_iu3awlr wrote

Thank you so much. My case the alternatives are on regression

1

YamEnvironmental4720 t1_iu3frfr wrote

Ok, in that case there is the cost function, defined on the model's parameters, that measures the average distance from the sample points to your hypothesis. This is the average error the model has for the fixed parameters. In the case of linear regression, the importance of a certain variable is given by the weight parameter attached to that variable.

If you are familiar with multidimensional calculus, the dependence of a fixed such parameter is given by the partial derivative of the cost function in this direction.

This is quite well explained in Andrew Ng's video lecture on linear regression: https://www.youtube.com/watch?v=pkJjoro-b5c&list=PLLssT5z_DsK-h9vYZkQkYNWcItqhlRJLN&index=19.

1