LIME(Local Interpretable Model-Agnostic Explanations) for explaining machine learning models

6 min readApr 22, 2022

Interpreting a machine learning model with LIME.

Machine learning models are mostly black box models. It is a hectic task for humans to understand the prediction behind a black box machine learing model. Most of the decisions from the machine learning models involve complex functions and patterns. In real world scenarios we cannot make a decision based on a black box model prediction , But we can rely on the model if we are convinced by model that it is predicting logically. Below is an example of doctor is trying to trust a machine learning model which predicts disease from symptoms. Doctor can make sure the prediction is logical because model predict the output as flu based on symptoms sneeze and headache.So, explaining a prediction is plus point in trusting a model.

*Figure 1. Explaining individual predictions to a human decision-maker. Source: Marco Tulio Ribeiro.* *link*

“Why Should I Trust You?” Explaining the Predictions of Any Classifier, a joint work by Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin (to appear in ACM’s Conference on Knowledge Discovery and Data Mining — KDD2016), They proposed Local Interpretable Model-Agnostic Explanations (LIME), a technique to explain the predictions of any machine learning classifier, and evaluate its usefulness in various tasks related to trust.

Before entering into intuitions of algorithm, lets study what makes a good explainer ?

An explainer need some basic characteristics .

1, interpret ability : Models are interpretable when humans can readily understand the reasoning behind predictions and decisions made by the model. So, providing interpret able links between input variables and output prediction is great when comes to intetpretability.

2,Local fidelity : Explainer should at least faithful in explaining individual predictions than approximating a full model.

3, model agnostic : An explainer should explain any model.

Local interpretable model-agnostic explanations (LIME) is a black box model prediction interpretable algorithm. It can interpret the individual predictions of machine learning model.

According to paper the overall goal of LIME is to identify an interpretable model over the interpretable representation that is locally(some part of a global model) faithful to the classifier.

Intuition behind LIME

Imagine that we have a black box model. We have to explain the prediction of an instance(single data point). LIME algorithm converts the instance into a interpretable form and the algorithm create a model to approximate predictions locally (A part of function of main black box model) , LIME creates some perturbed( e.g., removing parts of some data, removing words or hiding some pixels of the image) data points from the instance(data point we need to explain). LIME selects some number of perturbed data points based on the radius of locality(maximum distance we can consider when selecting perturbed data point). This newly created data points are trained on our original black box model to get the labels . At last LIME trains the data points in our explainable model and make predictions same as of the label predicted by black box as much as possible. The explainable model is then converted to graphs.

Figure 2: Toy example to present intuition for LIME. The black-box model’s complex decision function f (unknown to LIME) is represented by the blue/pink background, which cannot be approximated well by a linear model. The bright bold red cross is the instance being explained. LIME samples instances, gets predictions using f, and weighs them by the proximity to the instance being explained (represented here by size). The dashed line is the learned explanation that is locally (but not globally) faithful. link

More deep:

Figure 3: Mathematically, local surrogate models with interpret ability constraint can be expressed as above:

LIME has two major parts : 1, generating new datasets and 2, approximating the prediction using a surrogate model(A surrogate model is an interpretable model that is trained to approximate the predictions of a black box model).

Part1

As a first step before training and generating perturbed data samples, the instance or data that need to be explained should be converted into a human understanding representation. We can consider the new data representation as presence or absence of features or as binary vector. If it is text data as input then its inerpretability vector is in the form of presence or absence of word. Or if the data is an image then image is converted to super pixels(dividing pixels into group based on pixel similarity). For tabular data it can be represented by presence or absence of features.

let 𝕏 ∈ ℝ^d be the original representation of an instance being explained, and we use 𝕏⁰ ∈ {0, 1}^d to denote a binary vector for its interpretable representation.

After representing the instance as 𝕏⁰, the next step is to create perturbed datasets for training both black box model and surrogate model.

This depends on the type of data, which can be either text, image or tabular data.

The datasets will be created from perturbations of the original instance (e.g., removing words or hiding parts of the image) or . For text data and image, Perturbed samples can be created by just hiding or removing non zero elements from the instance 𝕏⁰ (turn single words or super-pixels on or off). In the case of tabular data, LIME creates new samples by perturbing each feature individually, drawing from a normal distribution with mean and standard deviation taken from the feature. Let, Z⁰ ∈ {0, 1} ^d0 (which contains a fraction of the nonzero elements of x⁰). After creating some number of perturbed data points and converting them back to Z(mapping function maps binary inputs to real instance) and we select some nearest samples using Πx(z). Πx(z) = exp(−D(x, z) 2/σ2 ) be an exponential kernel defined on some distance function D (e.g. cosine distance for text, L2 distance for images) with width σ . Increasing Πx(z) causes the complexity to increase because increasing the distance, So, the surrogate model need to implement more complex function to represent that data points.

Let, f : R^d → R be the original black box model. The selected perturbed data points feeds into black box model for predicting the labels(f(x)).

Part 2

We define an explanation as a model g ∈ G, where G is a class of potentially interpretable models, such as linear models, decision trees. Note that the domain of g is {0, 1} d 0 , i.e. g acts over absence/presence of the interpretable components. As noted before, not every g ∈ G is simple enough to be interpretable — thus we let Ω(g) be a measure of complexity (as opposed to interpretability) of the explanation g ∈ G. For example, for decision trees Ω(g) may be the depth of the tree, while for linear models, Ω(g) may be the number of non-zero weights.

FIGURE 4: Explanation of the prediction of instance x = 1.6. The predictions of the black box model depending on a single feature is shown as a thick line and the distribution of the data is shown with rugs. Three local surrogate models with different kernel widths are computed. The resulting linear regression model depends on the kernel width. link

In order to make model (g) as good approximate of model (f). We have to make the loss between prediction from model f and model g minimum. LIME is a local fidelity algorithm It cannot approximate global function or whole f model, but we can approximate a part of model f . Perturbed data is trained on the model g and will reduce the loss.