Different ways to calculate Feature Importance
Determining which features are more relevant to the desired prediction output, called feature importance, has been a fun topic for about a year. I tried several different feature importance libraries, like scikit permutation_importance, eli5 PermutationImportance, and SHAP. I thought it might be useful to make a sort of collection of all the feature importance methods.
Notice that the permutation methods do not give the same listing of important features, because it depends on the permutation of features. So with this respect the SHAP method appears more reliable, because feature importance is based on model prediction with respect to different combinations of features.
I am using a dataset from Kaggle about the opening and closing price of coffee because I ❤️ coffee! Enjoy!
Libraries used across multiple cells
Subfunctions
Load Data
Prepare X and y matrix
Let’s test out some prediction models for this dataset:
Xgboost gives a better F1 score than RandomForest, so let’s use the xgboost model.
Permutation importance: identify important features
Way 0: permutation importance by hand
Way 1: scikit permutation_importance
Way 2: scikit feature_importance
Way 3: eli5 PermutationImportance
Way 4: SHAP (SHapley Additive exPlanations) by hand
We can see that it ranked close, high, low, open, and volume from most to least important.
Way 5: using SHAP (SHapley Additive exPlanations) library, hand calculate feature importance
We can see that the feature importance ordering for the by hand function is not the same as the Python SHAP library, but the first and last features are the same. Even though the result is not exactly the same, it is nice to know how the SHAP algorithm is working.
Way 6: SHAP (SHapley Additive exPlanations) Python library
Let’s look at the functions inside the shap_values and explainer objects, to better understand how to plot the shape values.
Set shap_values.feature_names for plotting
Summary plot OR beeswarm
Gives an ‘overview of which features are most important for a model’. It plots SHAP values for every feature for every sample, so we know how the value of each feature also influences prediction values; in the SHAP by hand calculation I only calculated SHAP for each feature because I summed over the samples.
The plot below sorts features by the sum of SHAP value magnitudes over all samples, and uses SHAP values to show the distribution of the impacts each feature has on the model output.
The color represents the feature value (red high, blue low), and the spread shows the improved prediction accuracy (positive) and the reduced prediction accuracy (negative).
Again we see that close, open, low, high, and volume are the most to least important features, based on the feature value spread. Low close prices increase the prediction accuracy for daily gains, vice versa high close prices decrease the prediction accuracy.
Waterfall plot
Shows which features ‘push’ the model output from the base value (the average model output over the training dataset). Features pushing the prediction higher are shown in red, those pushing the prediction lower are in blue.
Force plot
Shows one features on one plot. If we take many force plot explanations like the plots for each feature below, rotate them 90 degrees, and then stack them horizontally, we can see explanations for an entire dataset or the time-series plot below.
Shows all features on one plot.
Bar plot
Shows which features have the largest mean SHAP values.
Bar plot cohorts
The following plot shows important featues with respect to value groups of the most important feature, which is close in our case.
There are more functions in the SHAP library, like scatter, heatmap, decision_plot, but I find that the five mentioned plots are most useful.
Hope this listing of feature importance methods are useful, it is always nice to have good information in one spot! I will keep updating this post if I find more methods!
Happy Practicing! 👋
References
- Excellent blog post for explaining how to code SHAP, with references to the original paper (Lundberg and Lee, 2017). https://towardsdatascience.com/shap-explained-the-way-i-wish-someone-explained-it-to-me-ab81cc69ef30
- https://github.com/slundberg/shap
- https://medium.com/dataman-in-ai/the-shap-with-more-elegant-charts-bc3e73fa1c0c
- https://shap.readthedocs.io/en/latest/example_notebooks/api_examples/plots/decision_plot.html