ML Model Interpretability : ELI5 & Permutation Importance

Mayur Sand
Analytics Vidhya
Published in
7 min readMay 9, 2020

Introduction

Despite widespread adoption, machine learning models remain mostly black boxes. Understanding why certains predictions are made are very important in assessing trust, which is very important if one plans to take action based on a prediction. Such understanding also provides insights into the model, which can be used to transform an untrustworthy model or prediction into a trustworthy one. If the user does not trust the model they will never use it .

Currently, models are evaluated using accuracy metrics on an available validation dataset. However, real-world data is often significantly different, and the evaluation metric may not be indicative of the product’s goal. Inspecting individual predictions and their explanations is a worthwhile solution, in addition to such metrics.

Models are Opinions Embedded in Mathematics — Cathy O’Neil

Motivation

When I started working with different Data science models , I often asked myself about the quality of output in real world (irrespective of accuracy metrics). Most of the times , as Data scientist you get Test data and you have no idea of the BIAS that is build inside the data but you produce a model that may have high accuracy metrics . Machine learning models are used in various industries where bias in the data can lead to very high impacting decisions .

Conversation in life of Data scientist

When you are using simple models (Linear or Logistic regression) , one is able to explain results for sample data set . Normally these models does not suffice and we end up using Deep learning models which provided high performance but are black box to most of Data Science practitioners. Machine learning models are now used to make lot of critical decisions — Fraud detections , Credit rating , Self driving , Examining patients etc .

There are four major frameworks which can give us deep insights into the model predictions

  1. ELI5
  2. LIME
  3. Skater
  4. SHAP

ELI5 ( Explain like I’m 5) & Permutation Importance

ELI5 is a Python library which allows to visualize and debug various Machine Learning models using unified API. It supports all the scikit-learn algoithims (Algorithm that supports .fit & .predict methods) .It has built-in support for several ML frameworks and provides a way to explain white-box models (Linear Regression , Decision Trees ) & black-box models (Keras , XGBoost , LightGBM) . It works for both Regression & Classification models.

There are two main ways to look at a classification or a regression model:

  1. Global Interpretation : inspect model parameters and try to figure out how the model works globally.
  2. Local Interpretation : inspect an individual prediction of a model, try to figure out why the model makes the decision it makes.

For white-box models it supports both Global & Local Interpretation , for black-box models it supports only Global Interpretation .

For (1) ELI5 provides eli5.show_weights() function; for (2) it provides eli5.show_prediction() function.

It also provides formatter module to generate HTML , JSON & panda Dataframe of the model explanation

Demo

We will be using Bank Marketing Data Set — LINK. The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be (‘yes’) or not (‘no’) subscribed.

In the notebook , I have explained how we can use ELI5 with Logistic Regression , Decision Trees along with concept of Permutation Importance

Data Analysis

df = pd.read_csv("bank.csv")df.head()

I have detailed the pre processing steps in the Notebook required to run different Algorithms .

Using Eli5 to get importance of features globally and locally

1. Logistic Regression

After data processing , we can train our model using the GridSearch parameters.

# Logistic Regression
lr_model = LogisticRegression(class_weight="balanced",random_state=42)
gs = GridSearchCV(lr_model, {"C": [1., 1.3, 1.5]}, n_jobs=-1, cv=5, scoring="balanced_accuracy")
gs.fit(X_train, y_train)

We get balanced_accuracy_score of 0.70 . Not really impressive . Now we will use ELI5 to look inside the box and understand how it works. Import eli5 and use show_weights to visualise the weights of your model (Global Interpretation).

import eli5eli5.show_weights(lr_model, feature_names=all_features)
Description of weights assigned to features

This table gives us the weight associated to each feature (same as Logistic regression gives out of box) . The value tells us how much of an impact a feature has on the predictions on average, the sign tells us in which direction. Here if the campaign is in March, it increases the probability of the prospect to subscribe to the plan significantly. Did marketing team do something different in March? Or are prospects just more likely to subscribe in March? That’s a question to ask to the marketing team, depending on the answer, this finding may or may not be useful.

We can also use `eli5` to explain a specific prediction, let’s pick a row in the test data (Local Interpretation):

i = 4
X_test.iloc[[i]]
Detail of a customer
Selected customer subscribed

Our prospect subscribed to the term deposit after the campaign . Let’s see what our model would have predicted and how we could explain it to the domain expert.

eli5.show_prediction(lr_model, X_test.iloc[i], feature_names=all_features, show_feature_values=True)
Contribution of each feature towards the prediction.

For this prediction, it looks like the most important factor was that the prospect was contacted via phone (contact__cellular==1) and did not have a default (default__no==1). This information can be shared with domain experts to understand why those features were important. The contribution is weights * the column value.

2. Decision Tree

dt_model = DecisionTreeClassifier(class_weight="balanced")eli5.show_weights(dt_model, feature_names=all_features )

Compared to Logistic regression the interpretation is less valuable . It just gives as Feature importance is only giving me amplitude of how important those feature are relative to each other but not the direction .There are no values in red . It seems ‘pdays’ is important feature but I don’t know if on decreasing or increasing it how model is impacted .

Typically for tree-based models ELI5 does nothing special but uses the out-of-the-box feature importance computation methods which we discussed in the previous section. By default, ‘gain’ is used, that is the average gain of the feature when it is used in trees.

ELI5 Permutation Models

Permutation Models is a way to understand blackbox models . It only works for Global Interpretation . As output it gives weight values similar to feature importance that you get with algorithms by default, this shows relative importance among the features .

Concept

For each feature:

  1. Shuffle values in the provided dataset
  2. Generate predictions using the model on the modified dataset
  3. Compute the decrease in accuracy vs before shuffling

Compare the impact on accuracy of shuffling each feature individually.

A feature is “important” if shuffling its values increases the model error, because in this case the model relied on the feature for the prediction. A feature is “unimportant” if shuffling its values leaves the model error unchanged, because in this case the model ignored the feature for the prediction.

from eli5.sklearn import PermutationImportanceperm = PermutationImportance(dt_model, scoring="balanced_accuracy")perm.fit(X_test, y_test)eli5.show_weights(perm, feature_names=all_features)

In the code above we create a new instance of PermutationImportance that takes our trained model to be interpreted and the scoring method .Call fit on Permutation Importance object & use eli5's show_weigths .This will plot new feature importance:

Feature importance

It will shuffle numbers of times and give as output average importance & standard deviation . It does not give direction in which a feature impacts a model , it just shows the amplitude of feature . When you will use your model on new data, to predict whether someone will subscribe or not to your plan, the most important thing it will need to get the prediction right is whether you contacted the person by telephone. You’re not looking at what the model gave the most importance to whilst learning, but how it will give importance to features from now on based on what it has learnt.

Even though all the Models provide their own methods to calculate weights or feature important , ELI5 provides a unified API to access the feature importance information . This makes comparing models a bit easy

ELI5 provides an independent implementation of this algorithm for XGBoost and most scikit-learn tree ensembles which is definitely on the path towards model-agnostic interpretation but not purely model-agnostic like LIME.

The code used in this article is available on my GitHub .

In future series , I will cover model Interpretation techniques.

If you have any questions on ELI5 , let me know happy to help. Follow me on Medium or LinkedIn if you want to receive updates on my blog posts!

--

--