Explainable AI: A Comprehensive Review of the Main Methods

6 min readJan 4, 2022

In recent years, with the spread of AI models of various types (trees, neural networks, etc.), the role of explainable AI is becoming of vital importance.

First of all, let’s define what the explainable AI is:

Explainable artificial intelligence (XAI) is a set of processes and methods that allows human users to comprehend and trust the results and output created by machine learning algorithms. Explainable AI is used to describe an AI model, its expected impact and potential biases. It helps characterize model accuracy, fairness, transparency and outcomes in AI-powered decision making. [IBM]

In this article, we will go to see together the main methods used for explainable AI (SHAP, LIME, Tree surrogates etc.), and their characteristics. Surely they are not all the methods existing to date, but in any case, I consider this list sufficiently exhaustive.

P.S. Articles dedicated to each of them will arrive shortly, with insights and implementations in Python, so stay up to date! 😜

Here is the list of explainability techniques that we will analyze together:

SHAP
LIME
Permutation Importance
Partial Dependence Plot
Morris Sensitivity Analysis
Accumulated Local Effects (ALE)
Anchors
Contrastive Explanation Method (CEM)
Counterfactual Instances
Integrated Gradients
Global Interpretation via Recursive Partitioning (GIRP)
Protodash
Scalable Bayesian Rule Lists
Tree Surrogates
Explainable Boosting Machine (EBM)

Quite a long list right? Don’t be scared, I’ll be a walk in the park 😄

Before starting, however, we must introduce the concept of level of explainability. The explainability techniques are mainly divided into two categories: Global and local.

Global: they explain the model in general, noting its generic operating rules.
Local: They explain for every single data, how the model reasoned and the rules that led to a certain output.

GIF by The Office on gyphy.com

Explainability techniques

SHAP

SHAP (SHapley Additive exPlanations) is a framework that explains the output of any model using Shapley values, a game-theoretic approach often used for optimal credit allocation. While this can be used on any black-box model, SHAP can compute more efficiently on specific model classes (like tree ensembles).

This method is a member of the additive feature attribution methods class; feature attribution refers to the fact that the change of an outcome to be explained (e.g., a class probability in a classification problem) with respect to a baseline (e.g., average prediction probability for that class in the training set) can be attributed in different proportions to the model input features.

SHAP can be used both globally and locally.

LIME

Local interpretable model-agnostic explanations (LIME) is a method that fits a surrogate glass-box model around the decision space of any black-box model’s prediction. LIME explicitly tries to model the local neighbourhood of any prediction. LIME works by perturbing any individual data point and generating synthetic data which gets evaluated by the black-box system and ultimately used as a training set for the glass-box model.

LIME has been designed to be applied locally.

Permutation Importance

The idea is the following: feature importance can be measured by looking at how much the score (accuracy, F1, R², etc. — any score we’re interested in) decreases when a feature is not available.

To do that one can remove a feature from the dataset, re-train the estimator and check the score.

Of course, permutation importance can only be applied globally.

Partial Dependence Plot

The partial dependence plot (short PDP or PD plot) shows the marginal effect one or two features have on the predicted outcome of a machine learning model. A partial dependence plot can show whether the relationship between the target and a feature is linear, monotonic or more complex.

For a perturbation-based interpretability method, it is relatively quick. PDP assumes independence between the features and can be misleading interpretability-wise when this is not met

As for the previous one, it can only be applied globally.

Morris Sensitivity Analysis

It is a One-step-at-a-time (OAT) global sensitivity analysis where only one input has its level (discretized value) adjusted per run. Relative to other sensitivity analysis algorithms, the Morris method is fast (fewer model executions) but comes at the cost of not being able to differentiate non-linearities with interactions. This is commonly used for screening which inputs are important enough for further analysis.

Again, it can only be applied globally.

Accumulated Local Effects (ALE)

Accumulated Local Effects (ALE) is a method for computing feature effects. The algorithm provides model-agnostic (black box) global explanations for classification and regression models on tabular data. ALE addresses some key shortcomings of Partial Dependence Plots (PDP).

Although counterintuitive, it can only be applied globally.

Anchors

The idea behind anchors is to explain the behaviour of complex models with high-precision rules called anchors. These anchors are locally sufficient conditions to ensure a certain prediction with a high degree of confidence.

From this, it follows that it can only be applied locally.

Contrastive Explanation Method (CEM)

CEM generates instance-based local black box explanations for classification models in terms of Pertinent Positives (PP) and Pertinent Negatives (PN). Highlights not only what should be minimally and sufficiently present to justify the classification of an input example by a neural network (pertinent positives), but also what should be minimally and necessarily absent (pertinent negatives), in order to form a more complete and well-rounded explanation.

CEM is designed to be applied locally.

Counterfactual Instances

Counterfactual explanations ‘interrogate’ a model to show how much individual feature values would have to change in order to flip the overall prediction. A counterfactual explanation of an outcome or a situation takes the form of “If had not occurred, would not have occurred”. In the context of a machine, a learning classifier would be an instance of interest and would be the label predicted by the model.

Counterfactual Instances is designed to be applied locally.

Integrated Gradients

Integrated Gradients aims to attribute an importance value to each input feature of a machine learning model based on the gradients of the model output with respect to the input. It has many use cases including understanding feature importances, identifying data skew, and debugging model performance.

Integrated Gradients is designed to be applied locally.

Global Interpretation via Recursive Partitioning (GIRP)

A compact binary tree that interprets ML models globally by representing the most important decision rules implicitly contained in the model using a contribution matrix of input variables. To generate the interpretation tree, a
unified process recursively partitions the input variable space by maximizing the difference in the average contribution of the split variable between the divided spaces.

As we can guess, GIRP can only be applied globally.

Protodash

A new approach for finding “prototypes” in an existing machine learning program. A prototype can be thought of as a subset of the data that have a greater influence on the predictive power of the model. The point of a prototype is to say something like, if you removed these data points, the model wouldn’t function as well, so that one can understand what’s driving predictions.

Protodash can only be applied locally.

Scalable Bayesian Rule Lists

Learn from the data and create a decision rule list. They have a logical structure that is a sequence of IF-THEN rules, identical to a decision list or one-sided decision tree.

Scalable Bayesian Rule Lists can be used both globally and locally.

Tree Surrogates

Tree Surrogates are an interpretable model that is trained to approximate the predictions of a black-box model. We can draw conclusions about the black-box model by interpreting the surrogate model. The policy trees are easily human interpretable and provide quantitative predictions of future behaviour.

Tree Surrogates can be used both globally and locally.

Explainable Boosting Machine (EBM)

EBM is an interpretable model developed at Microsoft Research. It uses modern machine learning techniques like bagging, gradient boosting, and automatic interaction detection to breathe new life into traditional GAMs (Generalized Additive Models).

Explainable Boosting Machine (EBM) is a tree-based, cyclic gradient boosting Generalized Additive Model with automatic interaction detection. EBMs are often as accurate as state-of-the-art blackbox models while remaining completely interpretable. Although EBMs are often slower to train than other modern algorithms, EBMs are extremely compact and fast at prediction time.

ECM can be used both globally and locally.

That’s it for this article. Thank you so much for making it this far.

See you soon,
Francesco

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai

medium.com