Explainable AI (XAI)

Unboxing the “Black Box” Models

10 min readMay 30, 2020

Model Interpretability techniques to explain “Black Box” Models

Series: Interpretable Machine Learning

This article is based of implementation of different Model agnostic methods explained by Christopher Molnar in his book “Interpretable Machine Learning”. If you like to study in depth you can find the e-book at: https://christophm.github.io/interpretable-ml-book/index.html

Note: This article is Part 2 of series : Interpretable Machine Learning

To understand why interpretability is important and why one uses a complex model, refer

Part 1 — Model Complexity, Accuracy and Interpretability: https://medium.com/@sajee.a/model-complexity-accuracy-and-interpretability-59888e69ab3d

Introduction
Need for ML Interpretability
Why Model Agnostic methods?
Model Agnostic interpretable methods
Conclusion

Introduction

Today, Machine learning has bigger impact in our day-to-day lives, we need to know how they work internally to trust their predictions. One of the biggest challenges of using Machine learning is its “Black Box” nature which is because of their lack of explanation for their predictions.

Need for ML Interpretability

Its amazing to see how over a past few years, Machine Learning and AI has taken complete control over the decisions humans make in their lives. From medical diagnostics to legal decisions to companies taking business decisions based on ML models, it is a huge risk if we have no idea on “Why?” and “How?” we get a certain prediction. When we interpret a model, we can account for Fairness, Accountability and Transparency in the model’s predictions which can help build this Trust in them.

Interpretability is the degree to which a human can understand the cause of a decision. The higher the interpretability, the better it is to comprehend the decisions or predictions the model has made.

Christopher Molnar in his book “Interpretable Machine Learning” has described about interpretable models and non-interpretable models. For non-interpretable models, there are different Model Agnostic methods that can be used to interpret their decisions.

To interpret models we basically need to know:

Feature Importance
Effect of a feature on a particular prediction
Effect of each feature over large number of predictions

We will be discussing each of these in detail below.

Why Model Agnostic Methods

When you think of a “Black Box” you do not know whats inside it i.e. we wont understand the working inside it. It is easier to work with model-agnostic explanations because the same method can be used for any type of model. Alternative to model-agnostic interpretation methods is to use only interpretable models.

Advantages of Model agnostic methods:

Model flexibility
Explanation flexibility
Representation flexibility

Model Agnostic Interpretable Methods:

Feature Importance:

Permutation Feature Importance
Feature Interaction

Causal Interpretation:

Partial Dependence Plots (PDP)
Individual Conditional Expectation (ICE)

Surrogate Models:

Global Surrogate
Local Surrogate — LIME

Explain Predictions:

Scoped Rules
SHAP values

In this article, we will be using these model agnostic methods to interpret the results of Gradient Boosting Regressor model we created in the last article.

Feature Importance:

When business asks questions like “Why did our customers churn out?” or “What leads to more customer retention?”, it is important to understand the features that affect these prediction.

Permutation Feature Importance:

Permutation feature importance measures the increase in the prediction error of the model after we permuted the feature’s values, which breaks the relationship between the feature and the true outcome.

So if a feature is “important”, shuffling its values increases the model error.

Python Implementation — ELI5

Permutation Feature Importance using ELI5

Higher weight indicates higher feature importance.

Features like hour, working day, temp, humidity are important and hence changing value drastically changes the model predictions.

Feature Interaction:

Feature Interaction measures the variance in Partial dependence function of one feature i.e. how does permutation in one feature value affects the other feature. If the variance is high, then the features interact with each other, if it is zero, they do not interact.

Lets look into Correlation matrix to find out features that are highly correlated and then look into their feature interaction.

Features humidity — windspeed and temperature — humidity have negative correlation. Lets look into their feature interaction now.

Theory: Friedman’s H-statistic:

According to Friedman’s H-statistic we deal with 2 cases:

2 way interaction measure that tells us to whether and to what extend the two features interact with each other
Total interaction measure to tell us whether and to what extend a feature interacts with all other features in a model

Python Implementation — H-statistics

H statistic of the variables represented by the elements of array_or_frame and specified by indices_or_columns can be computed. The larger H, the stronger the evidence for an interaction among the variables. H varies from 0 to 1.

{('temp', 'atemp'): 0.15373552315496558,
 ('temp', 'humidity'): 0.09849995273815548,
 ('temp', 'windspeed'): 0.5574920397015759,
 ('atemp', 'humidity'): nan,
 ('atemp', 'windspeed'): nan,
 ('humidity', 'windspeed'): 0.4392361526105014}

Interaction between Temp & Windspeed and Humidity & Windspeed are really high.

Causal Interpretations:

Causal Interpretation of “Black Box” model shows that for features that contribute to the models predictions, how changes in their input results in model’s behavior change. To learn more about Causal interpretation of Black box model research, you can read more here.

How to get to causality:

Check for features that have high impact on model
Measure the importance based on the contribution to accuracy
Check for causality between the features and target

Friedman’s Partial Dependence Plots (PDP):

Partial Dependence Plots are one level drill down of feature importance. While feature importance shows what variables most affect predictions, partial dependence plots show how a feature affects predictions. Once we know what features are importance, we need to know how changing this features value can affect the model’s prediction i.e. the causal relationship between the feature and the prediction.

By keeping other features fixed, we can find out the causal interpretation between feature input and models prediction.

Python Implementation — PDPbox

Most of the bike rides happened when the temperature was warm but not too hot.

PDP plot using PDPBox — Marks on the x-axis indicate the data distribution

Hotter temperature, more bike rides. As temp increases above 20 degree celsius the number of bike rides increases and then reduces after it reaches around 30 degree celsius.

Bike rides increases when humidity exceeds 60%

The average behavior of PDPs can be misleading in the presence of strong interactions or for highly nonlinear response functions. This is when ICE plots help us to get better insight on the relationship.

Individual Conditional Expectations(ICE):

ICE plots disaggregates the PDP function to reveal interactions in individual differences. An ICE plot visualizes the dependence of the prediction on a feature for each instance separately, resulting in one line per instance. In case the feature having an interaction with any other feature, ICE plot is able to capture it better than PDP.

Python Implementation — PyCEbox

ICE plot using ICEBox— Black line shows the PDP plot and rest show the individual instances prediction

Most of the bike riders prefer to ride when temperature is above 20 degree celsius.

Surrogate Models:

Surrogate models are basically a simplified model that is trained to approximate the “Black Box” model under the constraint that surrogate model should be interpretable. Surrogate models can be either global level — interpreting the model or at local level — interpreting a single prediction.

Global Surrogate:

Global surrogate model is an interpretable model that is trained to approximate the predictions of a black box model. Fitting a surrogate model requires no information about the inner workings of the black box model, only the relation of input and predicted output is used. The choice of the base black box model type and of the surrogate model type is decoupled.

How does Surrogate model work?

Select a dataset X — a train dataset, dataset with same distribution or subset of data
Get the predictions of the black box model
Select an interpretable model type (linear model, decision tree, …)
Train the interpretable model on the dataset X and its predictions

Python Implementation — Tree Surrogate with Skater

Skater uses Tree Surrogate to explain a model’s learned decision policies. The base estimator(“Oracle”) could be any form of a supervised learning predictive model — in our case Gradient Boosting Regressor model.

#build a surrogate model
from skater.core.explanations import Interpretation
interpreter = Interpretation(training_data=X_train,
                             feature_names=X_train.columns)
from skater.model import InMemoryModel
model = InMemoryModel(GBR_model.predict, examples = X_train)
surrogate_explainer = interpreter.tree_surrogate(oracle=model, seed=5)# explainer fit
mae = surrogate_explainer.fit(X_train, Y_train, use_oracle=True, prune='post', scorer_type='mae')#Mean absolute error
mae = 22.945

The ouput of the implementation generates a fidelity score to quantify tree based surrogate model’s approximation to the Oracle. Given that MAE approximately is 23, the surrogate models is a close approximate to the Gradient Boosting Regressor model.

Local Surrogate (LIME)

Local Interpretable Model-agnostic explanations:

LIME provides local model interpretability i.e. it focuses on training local surrogate models to explain individual predictions. LIME tests what happens to the predictions when you give variations of your data into the machine learning model.

How does LIME work?

Select a local instance for which you want prediction
Peturb dataset to get new data points
Weight the new samples according to their proximity to the instance
Train a weighted, interpretable model on the dataset with the variations
Explain the prediction by interpreting the local model

Python Library — lime

Predicting the interpretation of an instance.

Intercept 263.5156364850176
Prediction_local [-4.57966725]
Right: 2.126336579439559

LIME Interpretation for an instance— Showing features that contribute and their effect on the prediction

Explain Predictions:

Getting explanations for individual predictions helps significantly to understand how model works.

Scoped Rules:

Anchors was published by the same group that worked on LIME after they found limitations of LIME in terms of it is not being clear whether a given explanation applies in a region where an unseen instance is located.

To learn in depth about Anchor, refer here — https://homes.cs.washington.edu/~marcotcr/aaai18.pdf

Anchors provide high-precision explains for individual predictions of any black-box classification model by finding a decision rule that “anchors” the prediction sufficiently. It basically uses a perturbation-based strategy to generate local explanations for predictions. So, instead of building a surrogate model, it uses easy-to-understand IF-THEN Rules — called Anchors.

How are LIME and Scoped Rules different?

Source: Difference between LIME and Anchors

LIME solely learns a linear decision boundary that best approximates the model given a perturbation space while Anchors approach constructs explanations whose coverage is adapted to the model’s behavior and clearly express their boundaries. Anchor method creates perturbations and evaluates for every instance that is being explained.

Python Implementation — Anchor

Anchor: year <= 0.00 AND weekday_6 <= 0.00 AND weekday_0 <= 0.00 AND 0.34 < temp <= 0.50 AND 6.00 < hr <= 12.00 AND season_Fall <= 0.00 AND month <= 4.00 AND humidity > 0.78 AND weekday_3 <= 0.00 AND weather_4 <= 0.00 AND total_count > 282.00 AND weather_2 <= 0.00 AND weekday_4 <= 0.00 AND weather_3 <= 0.00 AND 0.00 < season_Summer <= 1.00 AND season_Winter <= 0.00 AND windspeed <= 0.10 AND 0.33 < atemp <= 0.48 AND weekday_5 > 0.00 AND 0.00 < workingday <= 1.00 AND holiday <= 1.00 AND weekday_2 <= 0.00 AND weekday_1 <= 0.00 AND 0.00 < weather_1 <= 1.00 AND season_Spring <= 0.00Precision: 0.99
Coverage: 0.28

SHAP values:

SHAP stands for SHapley Additive exPlanations. Its a kernel based estimator for approach for Shapley values inspired by local surrogate models. SHAP explanation method computes Shapley values from coalitional game theory.

SHAP can be used for both Global and Local Interpretations.

Python Library — shap

Local Interpretations:

For local interpretability, the goal of SHAP is to explain the prediction of an instance x by computing the contribution of each feature to the prediction.

Actual Prediction — 149 | Average Prediction — 164.67

Local Interpretability shows contribution each feature has towards the individual prediction. Year and Working day has the most positive effect while hr has negative effect.

Global Interpretations:

To get an overview of which features are most important for a model we can plot the SHAP values of every feature for every sample.

Hr has the highest effect on the prediction followed by working day, temp and humidity.

Conclusion:

After walking through the different model agnostic methods to interpret our Gradient Boosting Regressor model, we get explanations about the predictions of the model as follows —

Hr, temperature, humidity, working day are most important features for our regression problem.
Temperature, humidity and windspeed features have high interaction among them
Bike rides increases with increase in temperature, increase in humidity and decreases with windspeed
Interpreting individual predictions results in understanding which features contributed positively or negatively to its prediction
Building a surrogate model to completely interpret Black box model

This is a great way to interpret any models internal working which helps us to build that “Trust” the model.

Explore H20.ai MLI Capability:

For most of the model agnostic methods discussed above, H2O Driverless AI provides robust interpretability of machine learning models to explain modeling results with their Machine Learning Interpretability (MLI) capability.

To learn more about H20.ai MLI capability —

https://www.h2o.ai/products-dai-mli/

About Me

Experienced Data Analyst with strong analytical skills and deep interest in Machine Learning. You can connect with me through Medium Blogs, LinkedIn or Tableau.

References:

[1] https://christophm.github.io/interpretable-ml-book/

Explainable AI (XAI)

Unboxing the “Black Box” Models

Series: Interpretable Machine Learning

Contents

Introduction

Need for ML Interpretability

Why Model Agnostic Methods

Model Agnostic Interpretable Methods:

Feature Importance:

Causal Interpretation:

Surrogate Models:

Explain Predictions:

Feature Importance:

Permutation Feature Importance:

Feature Interaction:

Causal Interpretations:

Friedman’s Partial Dependence Plots (PDP):

Individual Conditional Expectations(ICE):

Surrogate Models:

Global Surrogate:

Local Surrogate (LIME)

Local Interpretable Model-agnostic explanations:

Explain Predictions:

Scoped Rules:

SHAP values:

Conclusion:

Explore H20.ai MLI Capability:

About Me

References:

Written by Ann Sajee