Demystifying AI: Unraveling the mysteries of black box models — Part 1

Published in

d*classified

12 min readJul 17, 2023

Joseph Low, Data Scientist, Enterprise Digital Services (EDS) Programme Centre discusses the importance of Explainable AI (XAI) and various interpretability methods to explain the models’ predictions. This article is the first of a series on Explainable Artificial Intelligence (XAI).

This was developed as part of Tool-Ally (a Data Analytics Toolkit). Tool-Ally offers a set of customizable and reusable components to streamline and automate various aspects of the data analytics lifecycle, thereby supporting data scientists in performing common data science tasks more efficiently and overcoming challenges in the data science lifecycle. More details about Tool-Ally can be found here.

Introduction

Recent advancements in machine learning have led to the development of complex models with strong predictive performance. However, as these models become more complex, they also become less interpretable, as their inner workings become increasingly difficult to comprehend.

While complex models exhibit high levels of performance, the lack of transparency behind their predictions leaves stakeholders with limited understanding of how specific decisions are made. Without a clear understanding of how the model makes decisions, stakeholders may lack the confidence to trust the model’s predictions. Such concerns about the black box nature of complex models poses a challenge to adoption of machine learning models in critical decision-making domains.

Explainable AI (XAI) has thus emerged as a field of research dedicated to addressing this challenge. The key motivation behind XAI is to enhance the interpretability of black-box models without compromising their high predictive performance. By providing transparent and intuitive explanations of how a model arrives at its decisions, XAI builds trust and confidence in the model’s prediction, thereby encouraging adoption of machine learning systems.

Taxonomy of Interpretability Methods

Before diving into the various XAI techniques, let us first understand the approaches to interpret machine learning models.

Interpretability in machine learning models can be achieved in different ways, and two broad categories are intrinsic and post-hoc interpretability.

Intrinsic interpretability is achieved by constructing self-explanatory models that incorporate interpretability directly to their structures. Examples of intrinsically interpretable models are linear models and decision trees.
Post-hoc interpretability refers to the application of interpretation methods after model training and it can be further categorized into either model specific or model-agnostic methods.
- Model-specific methods derive explanations by inspecting the internal model structures and parameters. Examples of these include methods that examine the activation of each neuron in a neural network.
- Model-agnostic methods derive explanations by examining the relationship between input-output pairs of the trained model. These methods do not require access to the internal structure of the model, and hence are applicable to any machine learning models.

The interpretability techniques generate explanations that can be categorized as global or local.

Global explanation illuminates the inner working mechanism of the machine learning model by explaining the entire model behavior. It answers questions such as “what are the important features for the model’s predictions across a set of data points?”
Local explanation details how a machine learning model arrived at a specific prediction for an instance. It uncovers the causal relations between a specific input and its corresponding model prediction. It answers questions such as “why did the model predict the employee to attrite with a probability of 0.42?”

Collectively, global and local explanations help stakeholders build trust in the model and its predictions respectively.

Model-Agnostic Interpretability Methods

In this section, we will introduce the concepts of various XAI techniques, with a specific focus on model-agnostic interpretability methods, as such methods offer great flexibility and can be used to explain any type of machine learning model.

Global Methods

Let’s begin by examining the model-agnostic interpretability methods that provide global explanation. Global interpretation methods are useful when the objective is to understand the overall workings and mechanism of the model. These methods provide insights into the important features that contribute to the model’s predictions across a wide range of data points.

1. Partial Dependence Plot

The Partial Dependence Plot (PDP) provides a graphical representation of how the target variable’s prediction changes as the chosen feature(s) varies, while keeping all other features constant.

PDPs help to identify whether the relationship between a feature and the target variable is linear, nonlinear, or exhibit interaction with another feature.

*Partial Dependence Plot: univariate (left and center), bivariate (right)*

To create a PDP, the selected feature is systematically varied over its range, and the model’s predictions are observed and averaged across all other features. The above plot shows the average effect of the chosen feature on the model’s predictions. A step-by-step explanation of how a PDP is constructed can be found in this excellent blog post.

While PDPs offer a visual and intuitive representation of how a chosen feature affects the model’s predictions, one must be aware of the drawbacks associated with the method:

Variable Independence Assumption: when varying the values of the selected feature(s), PDPs assume that the feature(s) is independent of all other features. Such an assumption usually does not hold as features in a dataset are often correlated.
High-Dimensional Data: PDPs become less effective in high-dimensional datasets where interpreting the plot for each feature becomes challenging due to visual clutter and the combinatorial explosion of potential interactions.
Potential Over-Smoothing: PDPs rely on averaging predictions across different feature combinations. Such averaging can lead to over-smoothing, where important local relationships or variations in predictions are obscured.

2. Permutation Feature Importance

The Permutation Feature Importance (PFI) measures the decrease in model score when the values of a feature is randomly permuted. The motivation for randomly permuting the feature’s values is to break the relationship between the feature and the target so that the resulting decrease in model score can indicate how much the model depends on the feature.

A large decrease in model score indicates that the feature is important, as the model relied on the feature for the prediction. Conversely, a slight decrease or negligible change in model score indicates the feature is unimportant, as the model ignored the feature for the prediction.

*Permutation Feature Importance algorithm*

PFI possesses several benefits:

Intuitive Interpretation: the concept of permuting feature values and measuring the impact on model score is easy to understand and provides a clear and intuitive measure of feature importance that can aid in model explanation.
Accounts Feature Interactions: by permuting the feature, the interaction effects with other features are also destroyed. PFI thus takes into account both the main feature effect and the interaction effects on model performance.
Feature Importance Ranking: PFI can identify the most important features based on the feature’s importance score.

As with most techniques, PFI comes with its own drawbacks:

Variable Independence Assumption: like PDP, PFI works best when features are uncorrelated. In the case of correlated features, PFI’s permutation can result in unrealistic data instances.
Requires Labels: computing the reference score and score on corrupted data requires access to true labels. Consequently, this means that PFI is not applicable when the true labels are unavailable.

3. Global Surrogate

The Global Surrogate method constructs a surrogate model that is interpretable (e.g. linear model, decision trees) to approximate the predictions of the complex black box model.

While the Global Surrogate method provides an understandable representation of how input features contribute to the model’s overall predictions, its main limitations lies in the fact that surrogate models are simplified representations of the black box models, and may not capture the intricacies of the original model’s behavior.

Local Methods

Let’s now shift our focus to model-agnostic interpretability methods that provide local explanation. Local interpretation methods are useful when the objective is to understand how specific input features influence the model’s prediction. These methods provide a detailed understanding of the model’s decision-making process at an individual level.

1. Individual Conditional Expectation

Individual Conditional Expectation (ICE) provides a visual representation of how changes in the feature’s value (while keeping all other features constant) affect the model’s output for each instance in the dataset.

While both ICE and PDP describe visually the relationship between specific input feature(s) and the predictions of a machine learning model, they differ in the insights they provide:

- ICE focuses on the individual-level interpretation of feature effects. It generates separate curves for each instance, resulting in one curve per instance. ICE shows how changes in a specific feature affect the prediction for that instance.
- PDP focuses on the global interpretation of feature effects. It is the average of the curves of an ICE plot, so it comprises only a single overall curve. PDP shows the average effect of varying a specific feature across the entire dataset while holding other features constant.

*Individual Conditional Expectation plot with Partial Dependence curve (orange) overlaid*

A step-by-step explanation of how ICE is computed can be found in this excellent blog post.

While ICE provides a detailed understanding of feature effects at an individual instance level, it suffers from the same drawback as PDP: Variable Independence Assumption and High-Dimensional Data.

2. LIME (Local Interpretable Model-Agnostic Explanations)

Intuition for LIME: The black-box model’s complex decision function f is represented by the blue/pink background, which cannot be approximated well by a linear model. The bold red cross is the instance being explained. LIME samples instances, gets predictions using f, and weighs them by the proximity to the instance being explained (represented here by size). The dashed line is the learned explanation that is locally (but not globally) faithful.

LIME explains the prediction of a black box model on an instance by approximating its behavior locally. Here, we provide a brief overview of how LIME works:

Selecting an instance to explain: LIME takes as input an instance for which the explanation of its black box prediction is desired.
Generating perturbed data: LIME creates a perturbed dataset by sampling numerical features from a normal distribution and categorical features from its training distribution. The perturbed dataset is then passed into the back-box model, and the corresponding predictions are used as labels.
Weighting perturbed data: The perturbed samples are weighted based on their proximity to the instance of interest. Samples nearer to the instance of interest are assigned higher weights.
Fitting an interpretable model: Using the perturbed dataset and their corresponding labels, LIME trains a weighted, interpretable model to approximate the black-box model locally around the instance of interest.
Generating local explanation: The weights of the local interpretable model are used to explain the black box model’s prediction on the instance.

*LIME’s explanation for a binary classification task (tabular data)*

LIME is a highly popular technique for local explanation due to its following advantages:

Versatility across Data Types: LIME is versatile and can be applied to a wide variety of data types, including tabular, text and images.
Feature Importance Ranking: LIME’s explanation incorporates feature importance weights that indicates the relative contribution of each feature to the local predictions. These weights enable the identification of the most important features that influenced the prediction for the specific instance.

Despite its popularity, it is important to be aware of the limitations of LIME:

Size of Neighborhood: The selection of the neighborhood size in LIME is a significant challenge. Setting too large a neighborhood results in the inclusion of irrelevant instances, whereas setting too small a neighborhood results in lack of diversity, both of which can affect the quality of the explanation.

3. SHAP (SHapley Additive exPlanations)

Intuition for SHAP: SHAP assigns each feature an importance value for a particular prediction

At its core, SHAP is based on the concept of Shapley values from cooperative game theory. Shapley values quantify the contribution of each player in a cooperative game. In the context of machine learning, each feature of the model plays the role of a player, and the prediction represents the game’s payoff. By assigning a value to each feature, Shapley values allow us to understand the contributions of each feature towards the prediction.

The idea behind Shapley values is to consider all possible coalitions of features and measure their marginal contributions to the prediction, defined as the difference in prediction when the feature is in a coalition compared to when it is excluded. Averaging the marginal contributions across all possible coalitions yield the Shapley value for each feature.

Formally, given an output function v, a set of features F, and some subset of features S, the Shapley value of feature j ∈ F is:

For a detailed explanation of Shapley values, check out this blog post.

Up till now, we have described how Shapley values aid in understanding the predictions of machine learning models. How then, does SHAP come into the picture?

It turns out that while Shapley values offer a promising approach to model explainability, it comes with a major drawback. An exact computation of the Shapley value by searching through all 2^|F| possible coalitions of the feature, while retraining models, is computationally expensive.

SHAP overcomes the computational challenge associated with Shapley values by providing computationally efficient, theoretically robust approaches to calculating Shapley values. For a deep dive into SHAP’s various estimation approach for Shapley values, head over to these blog posts on KernelSHAP, TreeSHAP, and PartitionSHAP.

*SHAP’s explanation for a regression task (tabular data)*

SHAP has gained recognition and widespread adoption as an explainability technique due to its notable advantages:

Theoretical Foundation in Game Theory: Because SHAP computes Shapley values, it has a solid theoretical foundation in game theory. The theoretical basis of SHAP in game theory, along with the desirable properties of Shapley values (efficiency, symmetry, dummy, and linearity) strengthens its credibility as an explainability technique.
Versatility and Granularity of Explanation. Like LIME, SHAP is applicable to tabular, text, and images. Additionally, SHAP offers the ability to generate explanations at both global and local levels.
Feature Importance Ranking. Like LIME, SHAP’s explanations offer feature importance rankings that quantify the relative contribution of each feature to the model’s predictions at both global and local levels.

The main disadvantage of SHAP is that while it improves upon the computational complexity of exact Shapley value computations, computing exact SHAP values remains computationally expensive for models with a large number of features.

Summary

Here’s a summary of the various model-agnostic interpretability techniques introduced, alongside their advantages and disadvantages.

PDP: visualizes how changes in feature values affect target’s prediction

(+) Visual and intuitive representation of feature-prediction relationship

(-) Assumes features are independent

(-) Unable to scale to high dimensional setting

(-) Averaging may lead to over-smoothing

Permutation Feature Importance: measures feature importance by shuffling feature values

(+) Accounts feature interaction effect

(+) Provides feature importance ranking

(-) Assumes features are independent

(-) Not applicable when labels are unavailable

Global Surrogate: approximates black box model’s decision boundary using an interpretable model

(+) Understandable representation of how input features contribute to the model’s overall predictions

(-) Surrogate model may not capture the intricacies of the black box model’s behavior

Individual Conditional Expectation: visualizes how changes in feature values affect target’s prediction for individual instances

(+) Visual and intuitive representation of feature-prediction relationship

(-) Assumes features are independent

(-) Unable to scale to high dimensional setting

LIME (Local Interpretable Model-Agnostic Explanation): approximates black box model’s decision boundary locally

(+) Understandable representation of how input features contribute to the model’s prediction for the instance

(+) Versatile across variety of data types (tabular, text, images)

(+) Provides feature importance ranking

(-) Difficulty in selecting the size of neighborhood

SHAP (SHapley Additive exPlanation): measures feature contributions using Shapley values from game theory

(+) Understandable representation of how input features contribute to the model’s overall (and instances’) predictions

(+) Theoretical foundation in game theory

(+) Versatile across variety of data types (tabular, text, images) and granularity of explanation (global, local)

(+) Provides feature importance ranking

(-) Computationally expensive

Conclusion

In this article, we introduced the motivation behind XAI, described the taxonomy of interpretability methods, and provided a brief overview of the various model-agnostic interpretability techniques.

In the next article, we will provide a walk-through of how we developed Tool-Ally to improve the interpretability of existing XAI methods, specifically LIME and SHAP. Both methods offer the benefits of other interpretability methods while being applicable across various data types, including tabular, images, and text, and are thus widely utilized for explaining any type of black box model.

Stay tuned!

Demystifying AI: Unraveling the mysteries of black box models — Part 1

Table of Contents:

Introduction

Taxonomy of Interpretability Methods

Model-Agnostic Interpretability Methods

Global Methods

1. Partial Dependence Plot

2. Permutation Feature Importance

3. Global Surrogate

Local Methods

1. Individual Conditional Expectation

2. LIME (Local Interpretable Model-Agnostic Explanations)

3. SHAP (SHapley Additive exPlanations)

Summary

Conclusion

Further Readings

Partial Dependence Plot

Permutation Feature Importance

Individual Conditional Expectation

LIME

SHAP

Written by Josephlyr