Feature Importance

EXPLAINABLE ARTIFICIAL INTELLIGENCE

Published in

Time to Work

4 min readOct 10, 2019

Machine Learning (ML) has seen widespread industry adoption only in the last couple of years. Hence, model interpretation as a concept is still mostly theoretical and subjective.

It is really important to know feature importance to build your machine learning model. We have several benefits:

Better understanding of the model’s: in some business cases it makes sense to sacrifice some accuracy for the sake of interpretability. For example, when a bank rejects a loan application, it must also have a reasoning behind the decision, which can also be presented to the customer.
Variable selection: you can remove variables that are not that significant.

1. Installing the packages

2. Get the data

3. Clean, prepare and manipulate Data (feature engineering)

4. Modeling

5. Classifier: Random Forest

Now we are going to see the heatmap of the feature importances:

Random Forest Feature Importance Heatmap

Random Forest Model Performance Evaluation

5. Classifier: XGBoost

Again, we are going to see the heatmap of the feature importances:

XGBoost Model Performance Evaluation

XGBoost Performance

2. Model Interpretation with ELI5

ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions in an easy to understand an intuitive way. It is perhaps the easiest of the three machine learning frameworks to get started with since it involves minimal reading of documentation! However it doesn’t support true model-agnostic interpretations and support for models are mostly limited to tree-based and other parametric linear models.

Let’s look at some intuitive ways of model interpretation with ELI5 on our classification model.

Feature Importances with ELI5

Typically for tree-based models ELI5 does nothing special but uses the out-of-the-box feature importance computation methods which we discussed in the previous section.

By default, ‘gain’ is used, that is the average gain of the feature when it is used in trees.

eli5.show_weights(xgc.get_booster())

Explaining Model Prediction Decisions with ELI5

One of the best way to explain model prediction decisions to either a technical or a more business-oriented individual, is to examine individual data-point predictions.

Typically, ELI5 does this by showing weights for each feature depicting how influential it might have been in contributing to the final prediction decision across all trees.

Typically, the prediction can be defined as the sum of the feature contributions + the “bias” (i.e. the mean given by the topmost region that covers the entire training set).