SHAP Part 1: An Introduction to SHAP

Rakesh Sukumar
Analytics Vidhya
Published in
6 min readMar 30, 2020
Image Credit: Should AI explain itself

Why do we need Model Interpretability ?

Before we get to the “why” part of the question, let’s understand what is meant by Interpretability. While there is no mathematical definition for interpretability, a heuristic definition like the one here: “Interpretability is the degree to which a human can understand the cause of a decision³”, or here: “Interpretability is the degree to which a human can consistently predict the model’s result⁴”can be found in machine learning literature. The higher the interpretability of a machine learning model, the easier it is for someone to comprehend why a certain prediction was made by the model.

Now, as we have defined model interpretability, let’s examine why interpretability is important for machine learning models.

  • Trust Building: Many organizations rely on machine learning models to take important decisions.A bank using a model to grant or deny loan to an applicant, a hospital evaluating a patient for the risk of a medical condition, or a credit card company predicting whether a transaction is fraudulent are a few examples. Users of machine learning models trust the predictions better if the model can also explain why it made the prediction.
  • Human Curiosity & Learning: Human beings have a mental model of their environment that is continuously updated by finding an explanation for events happening around them. A black box machine learning model may leave it’s users puzzled as it provides little explanations on the predictions made. On the other hand, an interpretable machine learning model can facilitate learning and help it’s users develop better understanding and intuition on the prediction problem.
  • Detect Biases and Edge Cases: Consider the example of a machine learning model used to grant or deny loan in a bank. It is possible that model might pick up biases in the training data and may discriminate against certain minority groups. Interpretability can help in detecting biases at the time of model development. Interpretability can also help in identifying edge cases, where a model might fail. For example, if an ML model in self-driving cars is found to use the two wheels of a bicycle to detect a cyclist, it may prompt us to think how the model would behave if it sees a bicycle with side bags, where the wheels may not be clearly visible.
  • Regulatory Requirements: Regulations such as GDPR in Europe or Equal Credit Opportunity Act in US provides individuals the right to be given an explanation on decisions made by a machine learning model that significantly affects them, particularly legally or financially.

What is SHAP ?

SHAP (SHapley Additive exPlanation) is a game theoretic approach to explain the output of any machine learning model. The goal of SHAP is to explain the prediction for any instance xᵢ as a sum of contributions from it’s individual feature values. Individual feature values are assumed to be in a cooperative game whose payout is the prediction. In this setting, Shapley values provides a means to fairly distribute the payout among the feature values. Note that the “feature value” here refers to the numerical or categorical value of a feature for the instance xᵢ. We will explain the concept with a simple example below.

What are Shapley Values ?

Let’s assume that A, B, C, D are four players with different skillsets in a coalition game (i.e. A, B, C & D are in the same team) with some payout. What is the fairest way to divide the payout among the players? One way to answer this question is to think that the players joined the group in a sequence (for example: A>B>C>D), then, we can compute the marginal contribution of each player as the change in payout as the player joined the group. However , there could be interaction effects that might affect the payout calculation. For example, if A and B have diverse skillsets, then the total payout for a group with just A and B will be the sum of the following 3 components:

  • payout with A alone,
  • payout with B alone,
  • additional payout for having both A & B.

However, if we assume that B joined after A, then the above algorithm fully attributes the “additional payout for having both A & B” to player B, which is incorrect. We find that the marginal contributions would be dependent on the sequence in which we assume the players to join the group. Shapley values overcomes this shortcoming by computing the average marginal contribution for each player over all possible sequences. Thus if there are n players, then n-factorial possible sequences are considered. Here, we assume that the payouts can be calculated for any subset of players.

In the context of a machine learning model, individual feature values of the instance xᵢ are the players and “the prediction yᵢ minus the average prediction for whole training data” is the payout. It can be theoretically proved that Shapley values are the only attribution method that satisfies the following properties:

  1. Efficiency: The feature contributions must add up to the difference of prediction for xᵢ and the average prediction.
  2. Symmetry: The contributions of two feature values j and k should be the same if they contribute equally to all possible coalitions.
  3. Dummy: A feature j that does not change the predicted value — regardless of which coalition of feature values it is added to — should have a Shapley value of 0.
  4. Additivity: For a game with combined payouts P₁ and P₂, the respective Shapley values should be ϕ₁ᵢ + ϕ₂ᵢ.

For a model with a prediction function f(x) & M features, we can obtain Shapley values as:

Notation: |M| is the total number of features. S represents any subset of features that doesn’t include the i-th feature and |S| is the size of that subset. fₓ() represents the prediction function for the model.

The above formula is a summation over all possible subsets (S) of feature values excluding the i-th feature value. Here, |S|! represents the number of permutations of feature values that appear before i-th feature value. Similarly, (|M|-|S|-1 )! represents the number of permutations of feature values that appear after the i-th feature value. The difference term in the above equation is the marginal contribution of adding i-th feature value to S. Also note that the above equation requires us to compute the model’s prediction for any subset of features, which may not be feasible for an ML model.

SHAP from Shapley values

SHAP values are the solutions to the above equation under the assumptions: f(xₛ) = E[f(x|xₛ)]. i.e. the prediction for any subset S of feature values is the expected value of the prediction for f(x) given the subset xₛ.

The exact computation of SHAP values is computationally challenging. SHAP paper² describes two model-agnostic approximation methods, one that is already known (Shapley sampling values) and another that is novel & is based on LIME (Kernel SHAP). SHAP paper also describes several model-type-specific approximation methods such as Linear SHAP, Tree SHAP, Deep SHAP etc. These methods assume feature independence and model linearity to simplify the computation of SHAP values. We will explore some of these methods in details in subsequent articles.

Key contributions of the SHAP paper are:

  • Identification of a new class of additive feature importance measures which unified six existing methods.
  • Theoretical results demonstrating that a unique solution exits for this class of methods with desirable properties.
  • New methods for computing feature importance values with improved computational performance and better consistency with human intuition.

Link to other articles in this series:

SHAP Part 2: Kernel SHAP

SHAP Part 3: Tree SHAP

References:

  1. Interpretable Machine Learning — A Guide for Making Black Box Models Explainable.
  2. SHAP: A Unified Approach to Interpreting Model Predictions. arXiv:1705.07874
  3. Miller, Tim. “Explanation in artificial intelligence: Insights from the social sciences.” arXiv Preprint arXiv:1706.07269. (2017)
  4. Kim, Been, Rajiv Khanna, and Oluwasanmi O. Koyejo. “Examples are not enough, learn to criticize! Criticism for interpretability.” Advances in Neural Information Processing Systems (2016).
  5. Explainable AI for Science & Medicine — Microsoft Research
  6. https://towardsdatascience.com/one-feature-attribution-method-to-supposedly-rule-them-all-shapley-values-f3e04534983d

--

--