SHAP Part 2: Kernel SHAP

Published in

Analytics Vidhya

7 min readMar 30, 2020

Kernel SHAP is a model agnostic method to approximate SHAP values using ideas from LIME and Shapley values. This is my second article on SHAP. Refer to my previous post here for a theoretical introduction to SHAP.

What is LIME?

Local Interpretable Model-agnostic Explanations (LIME) is a technique of explaining the predictions of a black box machine learning model by building a number of interpretable local surrogate models (like linear regression). Surrogate models are trained on the predictions of the underlying black box model. The recipe to train local surrogate models is:

Select the instance of interest xᵢ for which you want to have an explanation for it’s black box model prediction.
Generate a new dataset by perturbing the feature values of xᵢ. We do not use the actual feature values of xᵢ for the surrogate models, but a simplified binary version (zᵢ) is constructed as follows: If x∈Rᴾ is the original representation of xᵢ, the simplified binary version (called interpretable representation²) is zᵢ∈{0, 1}ᴾ. For example if xᵢ = (x₁, x₂, x₃), the corresponding interpretable representation zᵢ is given by zᵢ = (z₁, z₂, z₃), where each of z₁, z₂ and z₃ may take values 0 or 1.
Note the predictions of the black box model for each perturbed sample zᵢ. The predictions are obtained by mapping zᵢ back to the original representation Rᴾ as follows: a ‘1’ in zᵢ is mapped to the actual feature value & a ‘0’ is mapped to an appropriate non-informative value according to the type of the dataset. Refer to Understanding how LIME explains predictions & Understanding lime to understand the treatments for tabular, text and image datasets. The SHAP KernelExplainer() function (explained below) replaces a ‘0’ in the simplified representation zᵢ with a random sample value for the respective feature from a given background dataset. Thus, the independent variables for the local surrogate model are a bunch of ones and zeroes and the dependent variable is the prediction obtained. However, this process of computing feature contributions make additional assumptions of feature independence and model linearity, at-least locally around the proximity of xᵢ.
Weight the new samples according to their proximity to the instance of interest (xᵢ).
Train an interpretable model (like linear regression, lasso, decision tree etc.) on this new dataset.
Explain the prediction of the black box model by interpreting the local model (also called the explanation model).

LIME requires the user to select the complexity of local surrogate model and an appropriate kernel function to assign weights to sample in the generated dataset. Below figure presents the intuition behind LIME.

Intuition behind LIME: The black box models decision function is represented by the blue/pink background. The bold red cross is the instance being explained. The grey dashed line represent the explanation model built².

Kernel SHAP: Linear LIME + Shapley Values

In the SHAP paper³ the authors show that with a weighted linear regression model as the local surrogate model and an appropriate weighting kernel, the regression coefficients of the LIME surrogate model estimates the SHAP values. The Shapley kernel that recovers SHAP values is given by:

Where M is the number of features & |z’| is the number of non-zero features in the simplified input z’.

We will go through an example on iris dataset to understand the working of KernelExplainer. We will use google Colab to run our code. The code file is uploaded here: Kernel_SHAP.ipynb

We will use SHAP KernelExplainer to explain the SVM model.

Arguments of KernelExplainer() function:

model: The model to be explained. The output of the model can be a vector of size n_samples or a matrix of size [n_samples x n_output] (for a classification model).
data: Background dataset to generate the perturbed dataset required for training surrogate models. We simulate “missing” (‘0’s in zᵢ) by replacing the feature with the values it takes in the background dataset. So if the background dataset is a simple sample of all zeros, then we would approximate a feature being missing by setting it to zero. For small problems this background dataset can be the whole training set, but for larger problems consider using a single reference value or using the kmeans function to summarize the dataset.
link: A function to connect feature contribution values to the model output. For a classification model, we generally explain the logit of the predicted probability as a sum of feature contributions. Hence, if the output of the “model” (the first argument) is a probability, we set link = “logit” to get the feature contributions in logit form.

Next, we compute the SHAP values as below:

Arguments of explainer.shap_values() function:

X: Dataset on which to explain the model output.
nsamples: No. of samples to draw to build the surrogate model for explaining each prediction.
l1_reg: The l1 regularization to use for selecting features to explain the model prediction. Possible values are: “num_features(<int>)”-Selects a fixed number of features to explain the model prediction; “aic”/”bic”-Uses AIC/BIC rules for regularization; <float>-Sets the alpha parameter for sklearn.linear_model.lasso; “auto”-Uses AIC when less than 20% of possible sample space is enumerated, otherwise uses no regularization.

For classification problems, explainer.shap_values() return a list of size n_classes (number of classes). For a binary classification model n_classes=2 (negative & positive class). Each object of this list is an array of size [n_samples, n_features] and corresponds to the SHAP values for the respective class. For regression models, we get a single set of shap values of size [n_samples, n_features]. Here, we have a 3-class classification problem, hence we get a list of length 3.

Explaining a Single Prediction

Let’s explain the prediction for the first item in the testset.

link=”logit” argument converts the logit values to probability. Each plot displays the base probability value for the respective class over the training dataset. Blue color indicates that the feature decreased the probability and the red color indicates that the feature value increased the probability.

Explaining Predictions for a More Than One Sample

If we take the above plot for any one class for each sample, rotate them 90 degrees and stack them side-by-side, we can explain the predictions for multiple samples in a single plot (note that the samples are ordered by similarity):

SHAP Summary Plots
shap.summary_plot() can plot the mean shap values for each class if provided with a list of shap values (the output of explainer.shap_values() for a classification problem) as below:

The above plot indicates that petal length (cm) had the greatest influence on the predictions for all 3 classes followed by petal width (cm).

If provided with a single set of SHAP values (shap values for a single class for a classification problem or shap values for a regression problem), shap.summary_plot() creates a density scatter plot of SHAP values for each feature to identify how much impact each feature has on the model output. Features are sorted by the sum of the SHAP value magnitudes across all samples.

For Setosa output, we see that the low values of petal length (cm) (indicated by blue colored dots) increases the probability that the sample is classified as Setosa (high shap values).

SHAP Dependence Plots

SHAP Depencence plots reveal interaction effects.