Opening the Black Box of Machine Learning Models: SHAP vs LIME for Model Explanation

Published in

Cmotions

14 min readMar 17, 2023

In recent years, the use of machine learning models within organizations has become more and more common. With applications such as customer churn prediction, fraud detection or predicting response to a marketing campaign, predictive models have proven to be a powerful tool within many businesses domains. Predictive models create value by allowing businesses to respond and adapt quickly to insights from data. The more accurate predictive models get, the better they can be used to optimize processes thus the more value they create.

Hence, many data scientist seek for the model with the highest accuracy, which are often complex, non-linear models such as XGBoost or Random Forest. While these models are very powerful, they also come with a significant disadvantage: due to their complexity, it is often difficult to find out why a certain prediction was made and how variables contributed to the decision. This is problematic because it is important for organizations to use data science methods in a responsible and ethical manner. Especially in sensitive cases with personal data involved, organizations need to be able to demonstrate that their models and processes are justifiable and not discriminating. This includes having an open and transparent decision making process, avoiding the so-called black box model. The second reason why we want to avoid black box models, is that explanations of predictions provide valuable insights. For example in churn prediction, it is not only valuable to have an accurate prediction of whether a customer is likely to leave, but also to know how the variables are related to churn. In short, opening the black box provides both transparency and actionable insights that organizations can benefit from.

Two well-known methods for explaining tree-based models such as RandomForest and XGBoost are SHAP and LIME. In this article, we will demonstrate how these work and how they can be used to open the black box of machine learning models. The following section will elaborate on the difference between global and local interpretability, followed by a short theoretical description of SHAP and LIME. We then train a RandomForest model and demonstrate how SHAP and LIME can be used for model explanation and compare the two methods. The focus of this article will be on the practical implementation of the two methods, so I will refer to other articles for more theoretical background.

Global vs. Local Interpretability

Before we start, it is important to distinguish between global and local interpretability. Global interpretability refers to insights in how predictor variables generally contribute to the target variable, either positive or negative. It allows us to get a sense of the importance or relevance of the predictors in our model, and how they relate to the target variable.

In contrast, local interpretability refers to insights in why an individual case received its prediction and how predictor variables contributed to the prediction. This is especially useful in models like RandomForest and XGBoost, because in these kind of models the contribution of variables to the prediction can vary per individual case.

SHAP

SHAP (SHapley Additive exPlanations) is a game theoretic approach to explaining machine learning models. It is based upon Shapley values, that quantify the contribution that each feature brings to the outcome of the model. Every prediction starts with the expected value (or average prediction) as the base value, and the SHAP value of each feature tells us how that feature contributes to the actual predicted probability relative to the base value. Hence, the final prediction is the sum of the average prediction and all the SHAP values.

In the first place, SHAP is a method for local interpretation of the model, because SHAP values quantify the contribution of each feature to a single prediction case. However, SHAP can also be used for global interpretation, where the SHAP values of multiple prediction cases are combined or aggregated to get a sense of the more general contribution of the features to the outcome. If you want to know more about how SHAP values are actually computed, I recommend you to read this article.

LIME

Another popular method for explaining machine learning models is LIME, which builds upon four basic principles:

Local: The explanation must be locally faithful, i.e. it must correspond to how the model behaves in vicinity of the instance being predicted.
Interpretable: Provide a qualitative understanding between the input variables and the output. The best way is to use a linear model with few variables.
Model-Agnostic: You can apply any model — i.e. Deep Learning Model, Regression, XGBoost, etc.- and it will always work. This also holds for future models.
Explanation: Explains predictions on the individual level.

LIME works by fitting a so-called local surrogate model that approximates the predictions of the underlying black box model. In contrast to SHAP, LIME can only be used for local interpretation, because the locally fitted surrogate model only applies in proximity of the data point being explained.

In the following section, we will prepare our dataset and fit a model, followed by a demonstration how SHAP and LIME can be used to explain the model. We will demonstrate the different functions and features of the SHAP and LIME package and compare them.

Preparing some data

Imagine you are working for a telecom company and you are responsible for the collection of subscription payments. In this role, it is your job to minimize defaults by customers. You decide to use customer data to train a model that can be used to predict whether someone is likely to be a defaulter or not. The model can then be used to recognize defaults in an early stage so that measures can be taken to minimize payment arrears. We are working with a dataset that contains information on mobile phone subscriptions, including an indicator for whether that subscription is defaulted. After training a model that predicts defaults, we are going to use SHAP and LIME to explain the model and explore how our predictor variables relate to our target variable.

#Firstly, we install the required packages:

!pip install SHAP
!pip install lime
!pip install Pillow==9.0.0 # The newest version results in plotting issues.

import pandas as pd

# In 90% of the cases, subscr_default has the same value as ind_debt_collector.
print('Proportion of cases where subscr_default == ind_debt_collector:', (df.subscr_default == df.ind_debt_collector).mean())

# We therefore drop the indicator for debt collector.
df = df.drop('ind_debt_collector', axis=1)

We have loaded and prepared a dataset in the variable df that describes whether a subscription is defaulted or not. We can preview the data using df.head():

df.head()

Our dataset contains information on mobile phone subscriptions, and the column subscr_default indicates whether that subscription is defaulted. Therefore, subscr_default is our target variable that we are going to predict. Our predictor variables contain information on the subscriptions, such as the monthly amount of the subscription, the price of the mobile phone, and the average minutes of calls per week. We will first train a model in order to predict default, and then we will demonstrate how SHAP and LIME can be used to explain the predictions of the model. Please note that since we want to explain SHAP and LIME, we need a trained model, but the focus of this article is not on how to train a prediction model. If you want to learn more about how to train a predictive model in python, I recommend you to read our series 'No Data Scientist is the Same'. In this series, we go into detail on how different types of data scientists build predictive models.

Training the model

In the code below, we first separate our target variable from our predictor variables, followed by a train/test split in order to be able to validate our model performance. We then train a Random Forest model and assess it’s performance on test dataset.

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from sklearn.metrics import plot_roc_curve

# Specifying our target variable:
target = 'subscr_default'

# Seperating our target variable from our features:
y = df[target]
X = df.drop(target, axis = 1)

# Splitting the dataset into a 70% train and 30% test 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 17)

# Initialize and fit the model:
rf = RandomForestClassifier(max_depth=10).fit(X_train, y_train)

# Predict: 
y_pred_rf = rf.predict(X_test)
y_pred_rf_train = rf.predict(X_train)

#Evaluate
print("Accuracy train set:", accuracy_score(y_train, y_pred_rf_train))
print("Accuracy test set:", accuracy_score(y_test, y_pred_rf))

cr = classification_report(y_test, y_pred_rf)
print(cr)

plot_roc_curve(rf, X_test, y_test)

Global model interpretation

Not bad, our model a has fairly decent predictive value with an AUC of 0.87. Now it’s time for some interpretation of our model! We start off with interpreting our model on the global level, so we can get an idea of the general relations between variables in the model. Scikit-learn has built in the attribute feature_importances_ to get a first idea of what features are important in the model. We can access the feature importances by using rf.feature_importances_, and we create a dataframe to store the feature importance values and their corresponding feature names:

df_fi = pd.DataFrame()
df_fi['feature'] = X.columns
df_fi['importance'] = rf.feature_importances_

df_fi.sort_values('importance', ascending=False).head(10)

SHAP for global interpretation

Great! We now know that the average minutes of calls per week is the most important feature in our model, followed by the monthly subscription price and number of visits to the online portal (MyCall). However, this only tells us that these features are important for predicting defaults, but not how these features are related to defaults. In other words, we don’t know whether a high number of avg_min_call_wk in general contributes positively or negatively to our prediction. This is what we can use SHAP for!

The SHAP package computes SHAP values by iteratively giving input to the prediction model, while leaving a different feature out every iteration. By doing so, it can quantify the contribution that each feature brings to the prediction. Please note that SHAP values for a prediction only apply to that specific prediction, and other predictions are likely to have different SHAP values. So because we want to start off with globally interpreting our model, we want to compute SHAP values for more than one data point. In the code below, we take a subsample of 5000 data points. We initialize a SHAP-explainer for our random forest model by using shap.TreeExplainer(rf), and we use the .shap_values() method to compute the SHAP values for our subsample of data points:

import shap
from shap import TreeExplainer

X_train_sample = X_train.sample(5000, random_state=17) # Take a sample because calculating SHAP values for the entire dataset would take way too much time.

explainer = shap.TreeExplainer(rf)
shap_values = explainer.shap_values(X_train_sample)

print(f"length of shap_values: {len(shap_values)} \n")
print(f"shape of elements within shap_values: {[i.shape for i in shap_values]} \n")

shap_values[0][1]#class0
shap_values[1][1] # class 1

explainer.shap_values() returns a list of length 2, where each element is a matrix of SHAP values with size n_samples x n_features. The length of this list is 2, because the predict_proba method of our RandomForest model returns an array of length two, holding the predicted probabilities for each of the classes. As a result, we also get separate SHAP values for each of the classes: shap_values[0] holds the SHAP values that quantify the contribution of features for predicting class 0 (no defaulter) and shap_values[1] holds the SHAP values that quantify the contribution of features towards predicting class 1 (defaulter). Note in the output above that these have the exact same magnitude, but in the opposite direction (negative gets positive and vice versa).

In our case, we want to explain how variables contribute to predicting if someone is a defaulter, which is class 1. We therefore use shap_values[1] as the values to visualize. We can do so by using the shap.summary_plot() function:

from matplotlib import pyplot as plt

f = plt.figure()
shap.summary_plot(shap_values[1], X_train_sample, plot_type='dot', show=False, plot_size=[16,8])

The dots in the plot above visualize the SHAP values for the 5000 records in our subsample. The color of a dot represents the feature value for the specific data point, and the position on the x-axis displays the corresponding SHAP value. A high SHAP value indicates a higher (positive) predicted probability by the model, whereas a lower (negative) SHAP value contributes negatively to the predicted probability by the model. If we look at the avg_min_call_wk feature of the model, we see that generally, high values for the average minutes of calls per week correspond with high SHAP values and thus, a higher probability of someone being a defaulter. Conversely, high value for the number of visits to the online portal (MyCall) generally indicates a lower probabilty of someone being a defaulter. Similar to sklearn's feature_importance_ attribute, the higher a feature appears in the summary plot, the more important or relevant the feature is in the model. The advantage of the SHAP summary plot is that we not only can see what the important features in our model are, but also how (positive or negative) feature values relate to the prediction.

In the summary plot above, color is used to represent the value of the features, and position on the x-axis is used to represent the SHAP values. This allows us to get a general idea of the relation between feature values and SHAP values, but if we want to zoom in on a feature and have a more detailed insight, SHAP offers the dependence_plot to explain the relation between individual features and the target variable. For instance, the dependence_plot below shows that when the monthly subscription amount exceeds 30, the probability to the subscription being defaulted increases significantly.

shap.dependence_plot(X_train_sample.columns.get_indexer(['monthly_subscr_amount']), shap_values[1], X_train_sample, interaction_index=False)

Although we now have a general idea of how our features contribute to the model predictions, we still have no insight into the individual predictions. If we want to know how all features together result in the final prediction, we are locally interpreting our model. Luckily SHAP also offers methods for explaining individual predictions!

Explain individual predictions

Let’s say that now that we have a general idea of the relation between our features and target variable, we decide to take our model into production and send all customers with a predicted probability above 0.8 an extra reminder to pay for their subscription. After sending the payment reminders, a customer, Jimmy, calls and asks why he has received a reminder while his friend Jack hasn’t. In order for an organization to comply with GDPR, you are obligated to provide a customer’s data when requested. In this case, you are requested to provide the data on the basis of which the customer received the payment reminder.

We can use the shap.force_plot() function to see how features contributed to the individual prediction for this person:

shap.initjs()

j = 49 # index of person with high probability (>0.8)

explainer = shap.TreeExplainer(rf)
shap.force_plot(explainer.expected_value[1], # Note that we use the expected value of class [1] (defaulter) as a base value
                shap_values[1][j, :], # So we also want the SHAP vaues for class [1]
                X_train_sample.iloc[j, :], 
                matplotlib=True,
                text_rotation=20)

import numpy as np

expected_value = y_train.astype(np.float32).mean()

print('Proportion of samples with class 1 in training dataset: ' + str(expected_value))

The plot above explains why a probability of 0.81 was assigned to Jimmy. Note that the base value represents the expected value of our model, which is the proportion of samples with class 1 in our training dataset. In other words, this is the probability that a random person in our training dataset is a defaulter. Based on the information that we have about Jimmy, the probability increases towards 0.81. The features in red are the features that contributed most positively to the predicted probabilty, while the features in blue contributed negatively to the probability. We can see that the fact that Jimmy has been in contact with our company (ind_customercontact_lastmonth) has a strong positive effect on the predicted value. This aligns what what we would expect according to the summary plot that we saw earlier, where all the red dots (ind_customercontact_lastmonth = 1) have positive SHAP-values. Furthermore, Jimmy calls on average 250 minutes per week, and avg_min_call_wk appears in red, which indicates that this is also one of the reasons that Jimmy's subscription was predicted to be defaulted.

Although the force plot reveals the positive or negative contribution of feature values to the prediction, we do not see how much probability is added to the baseline value for each feature value. A way to visualise this, is by using the waterfall plot. For this plot, we first need to generate another type of explainer and differently prepared shap values. As the run time shows, this method is quite expensive and therefore slow. However, it provides a nice breakdown of how the variables contributed to the final predicted probability. Note that we are still looking at the same customer, Jimmy.

import shap
explainer = shap.Explainer(rf, X_train_sample)
shap_values = explainer(X_train_sample,check_additivity=False)
shap.plots.waterfall(shap_values[:,:,1][j, :],max_display=20)

LIME

Another method for explaining model predictions is LIME. Unlike SHAP, LIME does not support functionality for global interpretation. But we can use LIME to interpret individual predictions locally, like we just did with SHAP. We can instantiate a LIME explainer by calling lime_tabular.LimeTabularExplainer():

import numpy as np
import lime
from lime import lime_tabular

explainer = lime_tabular.LimeTabularExplainer(
    training_data=np.array(X_train_sample.astype('int')),
    feature_names=X_train_sample.columns,
    class_names=['0', '1'],
    mode='classification'
)

Now let’s use LIME to explain the same data point that we just explained with SHAP and compare the differences:

exp = explainer.explain_instance(
    data_row=X_train_sample.astype('int').iloc[j], 
    predict_fn=rf.predict_proba
)

# exp.show_in_notebook(show_table=True) # Does not work in databricks

exp_html = exp.as_html()
displayHTML(exp_html) # good fix

Note that features are vertically sorted based on their importance for this specific prediction. In agreement with the SHAP explainer, ind_customercontact_lastmonth and avg_min_call_wk are the features that have the strongest positive effect on the predicted probability. However, for the features that contributed negatively to the probability, we see some differences between the two methods. For example, the negative effect of ind_payment_arrears seems to be more important according to LIME than it seems according to SHAP waterfall plot. The slight differences between the two methods are caused by the fact that SHAP and LIME work in a different way: SHAP iteratively removes features from predictions to evaluate the effect of that feature on the prediction. Conversely, LIME fits a local surrogate model, that only applies in vicinity of the data point being explained. Therefore, it seems that in vicinity to the data point of Jimmy, ind_payment_arrears has a stronger negative effect on the prediction compared to what SHAP would suggest. However, in general both methods suggest similar contributions of features, so I would not care too much about these differences.

SHAP versus LIME: the verdict

We have now seen how SHAP and LIME can be used to open the black box of ML-models. At first sight, SHAP seems more versatile: it offers methods for both local and global interpretation of models, and there are multiple options with regard to visualization. In contrast, LIME only offers a method for local interpretation of models, and is more restricted in terms of visualization. So why would we ever use LIME? Well, if we look at the running time for SHAP, computing the SHAP values for a subset of only 5000 samples already took 3 minutes. Furthermore, if we look at the computation time for the SHAP waterfall plot, we see that it takes a very long time for the algorithm to compute the values that indicate the contribution of each variable to the output. Conversely, a similar plot in LIME computes much faster. Therefore, it can be a good practice to choose SHAP over LIME when you want to explain your model on a global level, or when you only want to explain a few specific predictions. In situations where you want to explain large volumes of predictions and SHAP gets too slow, LIME can be a nice alternative!

Want to read more about the cool stuff we do at Cmotions and The Analytics Lab? Check out our blogs, projects and videos!

Opening the Black Box of Machine Learning Models: SHAP vs LIME for Model Explanation

Global vs. Local Interpretability

SHAP

LIME

Preparing some data

Training the model

Global model interpretation

SHAP for global interpretation

Explain individual predictions

LIME

SHAP versus LIME: the verdict

Written by Thomas ten Heuvel