Explainable Artificial Intelligence: Technical Perspective — Part 2

Sparsha Devapalli
6 min readAug 14, 2020

--

This article is a part of a series. You can check out the series here — Part -1 .

In the previous article, we explored the context of Explainable AI, its challenges and Ante-hoc techniques. In this article, we shall explore further upon Post-hoc techniques.

Post-hoc techniques:

With post-hoc techniques, it is possible to train the model normally and during testing, the explanation will be gathered.

We could interpret post-hoc techniques under two structures:

Interpretability of Machine Learning Models

Scope: This elucidates the notion whether model interpretation aims at each sample level locally or for the entire model behaviour for all data points globally.

Scope of Model Interpretations

Model–specific or Model-agnostic: This pretty much encompasses whether interpretation spans across for all models (Model-agnostic) or tailor-made for a class of algorithms- e.g., regression weights in a linear model. By definition, model-agnostic methods cannot have access to model internals such as weights or structural information.

Intrinsic Feature Importance:

Certain models like linear regression, logistic regression, decision trees etc., are designed in such a way that the model’s structure allows for ease of computing the relative importance of various factors, as well as uncertainty both at the model level and for individual pieces of data.

Intrinsic Feature importance using linear model coefficients

Using the coefficient outcomes from a linear model, we infer the feature importance of each predictor towards the target ‘mpg’ of a car model.

RuleFit (Tree Structures):

The RuleFit algorithm by Friedman and Popescu (2008) learns sparse linear models that include automatically detected interaction effects in the form of decision rules.

4 rules can be generated from a tree with 3 terminal nodes.

Each path through a tree can be transformed into a decision rule by combining the split decisions into a rule. The node predictions are discarded and only the splits are used in the decision rules.

Global Surrogate Models:

A global surrogate model is an interpretable model, equipped to approximate a black-box model’s predictions. The goal of the (interpretable) surrogate models is to approximate as accurately as possible the predictions of the underlying model and to be interpretable concurrently.

Global Surrogate Model

A step-by — step overview to understand the dynamics of a global surrogate model:

  • Obtain the predictions from black-box model
  • Select the interpretable model (e.g., Linear, Decision Tree etc., )
  • Train the interpretable model on the original dataset and use black box predictions as target.
  • Measure Surrogate Model efficiency.
  • Evaluate the surrogate model to observe how the black-box model orients its decisions.

Local Surrogate Models:

Local surrogate models are interpretable models used to describe individual predictions of black box machine learning models.

The recipe for training local surrogate models:

  • Select your instance of interest for which you want to have an explanation of its black box prediction.
  • Perturb the dataset and get the black box predictions for these new points.
  • Weight the new samples according to their proximity to the instance of interest.
  • Train a weighted, interpretable model on the dataset with the variations.
  • Explain the prediction by interpreting the local model.
LIME Algorithm

Local surrogate models are very promising. But the method is still in the development phase and many problems need to be solved before it can be safely applied.

LIME:

Local interpretable model-agnostic explanations (LIME), a concrete implementation of local surrogate models and is developed by researchers at the University Of Washington, aims to gain greater transparency on what’s happening inside an algorithm,

The cost of maintaining local fidelity grows harder when dimensionality increases. Conversely, LIME solves the far more feasible problem of seeking a model that locally approximates the original one.

LIME Summary Plot

LIME incorporates interpretability both in the optimisation and in the notion of interpretable representation, adapting both domain and task specific interpretability criteria.

The team also proposed SP-LIME, a method to select representative and non-redundant predictions, offering users a global perspective view of the model.

Partial Dependence Plots:

The partial dependence plot (short PDP or PD plot) displays the marginal effect of one or two features on the predicted outcome of the machine learning model.

A partial dependence plot can show whether the relationship between the target and a feature is linear, monotonic or more complex.

Here in this plot, the x-axis represents the value of features Age / Years on Hormonal contraceptives, and the y-axis represents the predicted cancer probability. The solid line in the shaded area shows how the average prediction varies as the value of Age / Years on Hormonal contraceptives varies.

PDP is very intuitive and easy to implement, but because it only shows the average marginal effects, heterogeneous effects might be hidden. For example, one feature might show a positive relationship with prediction for half of the data, but a negative relationship for the other half. The plot of the PDP will simply be a horizontal line.

To solve this problem, a new method was developed — ICE.

Individual Conditional Expectation:

Individual Conditional Expectation or ICE, is very similar to PDP, but instead of plotting an average, ICE displays one line per instance. Individual Conditional Expectation (ICE) plots display one line per instance, demonstrating how the prediction of the instance changes when a feature value changes.

We observe how the prediction varies as the values of individual instances of the feature — Age fluctuate. ICE plots can uncover heterogenous relationships.

Accumulated Local Effects:

Accumulated Local Effects (ALE) works in the similar way as PDP approach and overcomes the shortcomings of PDP’s i.e., interpretability of correlated features. ALE plots are a faster and unbiased alternative to partial dependence plots (PDPs).

Since they are unbiased, they handle correlated features much better than partial dependence plots. One way to interpret ALE is by saying –

“Let me show you how the model predictions change in a small “window” of the feature.”

Here’s a visual interpretation of what is going on in an ALE plot.

Calculation of ALE for one feature — x1

In the above figure, we divide x1 in windows. For all the points in each window, calculate the difference in prediction when we replace each point with the upper and lower bounds of the window.

Differences Between ICE, ALE and PDP:

To know more on tree based methods and techniques employed in deep learning, kindly continue on to Part -3.

References:

Christoph Molnar, “Interpretable Machine Learning, A Guide for making Black Box Models Explainable”, book

--

--