In my work as a datascientist I use a lot of blackbox algoritms such as gradient boosting, random forrest and neural networks. One question I ALWAYS get from the business I'm making the models for is, what features are important? How does the model make descicions. So to answer that I do the usual feature analysis, correlation matrices, partial dependence plots, mdi, model extraction. But I still fill like I'm not entirely able to answer what variables are the most important for example.
Now I was thinking of a new method to determine feature importance. First we need the trained model, and the feature distributions. If we take a feature, we look at the sorted values and take 11 values corresponding to 0% - 10% - .. - 100% of the feature distribution. Next we take for example 1000 random states of the other features and test per random state the 11 options for the selected feature. For this 11 values of the feature, we check the number of times the y-value (label) changes. After doing this for all features, we should have an order of feature importance, as a higer rate of changes indicates more influence on the labels outcome. Would als be applicable for discrete variables and continuous labels with some minor adjustments.
I love to hear your experiences in this regard and what you think of the proposed method?
localhoststream t1_ivjp65q wrote
Reply to [D] Simple Questions Thread by AutoModerator
In my work as a datascientist I use a lot of blackbox algoritms such as gradient boosting, random forrest and neural networks. One question I ALWAYS get from the business I'm making the models for is, what features are important? How does the model make descicions. So to answer that I do the usual feature analysis, correlation matrices, partial dependence plots, mdi, model extraction. But I still fill like I'm not entirely able to answer what variables are the most important for example.
Now I was thinking of a new method to determine feature importance. First we need the trained model, and the feature distributions. If we take a feature, we look at the sorted values and take 11 values corresponding to 0% - 10% - .. - 100% of the feature distribution. Next we take for example 1000 random states of the other features and test per random state the 11 options for the selected feature. For this 11 values of the feature, we check the number of times the y-value (label) changes. After doing this for all features, we should have an order of feature importance, as a higer rate of changes indicates more influence on the labels outcome. Would als be applicable for discrete variables and continuous labels with some minor adjustments.
I love to hear your experiences in this regard and what you think of the proposed method?