LIME for interpreting machine learning models maths explained with codes

with python code for interpreting a baseline neural network

Mehul Gupta
Data Science in your pocket

--

Extending in my ongoing Interpretable AI blog series, next comes a very interesting agnostic method for interpreting ML models i.e. LIME or Local Interpretable Model-agnostic Explanations. We will first understand how LIME functions followed by a sample code snippet to interpret a shallow Neural Network for binary classification.

Let’s get started !!

Assume that we wish to interpret a Neural Network trained for binary classifier with tabular training data (4 features only)

  1. First of all, generate summary stats for the training dataset’s features. This can differ for different data features in the training set. For example:

Numerical : Mean & Standard Deviation per feature

Categorical : Frequency for each category (this can be further converted to probability for each category using the formula n/N where n: samples for category ‘X’ & N: total samples)

2. Using the summary stats, generate a new, artificial dataset.

How?

We can generate multiple samples by generating random samples for each feature & combine them to form an artificially generated row. Numerical feature values can be generated assuming features to be normally distributed hence mean & std calculated in the prior step can be used to generate random samples while categorical feature samples can be generated using the probability distribution we calculated above from frequencies.

3. Pickup a random sample from the original dataset. Let’s call it X.

4. Assign weights to samples in the artificial dataset by using X as a reference. Hence the farther the samples from X, the lesser the weight. We can use any distance metric for this. The weight assignment can be done using Kernel Width as well where samples falling in a particular range of X are given higher weightage else low weightage.

5. Fit a white-box model over the selected samples (considering their priority/weightage) from the artificial dataset in the previous step(Decision Tree or Linear Regression). Note that we will consider the prediction made by the trained model (which we wish to interpret) for this artificial dataset as Ground Truth for this white box model

6. Interpret this white-box model. This interpretation will be considered for the black box model as well.

And we are done !!

Moving onto the next part of the post,

Python Implementation

We will be, as mentioned earlier, interpreting a Neural Network trained over a tabular dataset for binary classification.

%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import lime
import lime.lime_tabular
df = pd.read_csv('abc.csv')
test = pd.read_csv('test.csv')

Let’s overview df

target = df.pop('Target')#getting samples for both class separately in validation set, #dropping labels as not required for validation settest_0 = test[test['Target']==0]
test_1 = test[test['Target']==1]
_ = test_0.pop('Target')
_ = test_1.pop('Target')
def create_baseline(): # create model
model = Sequential()
model.add(Dense(60, input_shape=(4,), activation='relu'))
model.add(Dense(30, input_shape=(60,), activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
model = create_baseline()
model.fit(df,target,epochs=100)

Skipping the above code explanations as towards training a basic neural network. Let’s bring LIME into the picture

  1. Create LIME object
exp = lime.lime_tabular.LimeTabularExplainer(df.to_numpy() ,feature_names=df.columns,mode='classification', training_labels=target.to_numpy())

Do note that training dataset is passed as an numpy array in the above initialization

2. Create a function that returns probability for all classes. In our case, it should be a 2d array of size (len(data),2). Why? len(data) represents flexible number of samples(rows) while 2 represent 2 values per sample (probabilities for classes 0 & 1)

def return_prob(data):
p1,p2 = (np.ones(len(data))[0] - model.predict(data)), model.predict(data)
prediction = [[x,y] for x,y in zip(p1,p2)]
return np.array(prediction).reshape(len(data),2)

3. Using the above LIME object initiated, call out explain_instance function to explain the model’s output on a particular sample. We would be running this for one sample each from both classes 0 & 1

negative_lime = exp.explain_instance(test_0.to_numpy()[0],return_prob,top_labels=2)positive_lime = exp.explain_instance(test_1.to_numpy()[0],return_prob,top_labels=2)

As you can see

  • One sample from each class is passed & their interpretation has been stored in different objects (negative_lime & positive_lime respectively)
  • we have also passed return_prob() we declared above
  • top_labels is a bit tricky and requires a brief explanation

Assuming we have a multi-classification problem with 25 possible classes, we might not wish to have an explanation for all labels but we would wish to calculate the interpretation for the top 2 or 3 labels. So as to avoid clutter & save computation time, top_labels can be used. I have used it as 2 because I wish to calculate the interpretation for the sample for both the possible classes.

4. The most important segment, visualizing the interpretation

def lime_exp_as_pyplot(exp,flag):    exp_list = exp.as_list(label=1)
fig, ax = plt.subplots(figsize=(8,5))

vals = [x[1] for x in exp_list]
names = [x[0] for x in exp_list]

vals.reverse()
names.reverse()

colors = ['green' if x > 0 else 'red' for x in vals]
pos = np.arange(len(exp_list)) + .5

ax.barh(pos, vals, align='center', color=colors)

plt.yticks(pos, names)
plt.title(flag)
return fig, axf, ax = lime_exp_as_pyplot(negative_lime,'negative sample')
f, ax = lime_exp_as_pyplot(positive_lime,'positive sample')

Let’s understand the above code snippet

  • exp.as_list() helps us to output the interpretation as a list. We have passed it an argument label=1. Why? This will output how sample X is associated with class 1. If we would have passed 0, the output would be in respect of class 0.

Do see the below plots for better understanding of the output.

  • If a feature is associated positively with the label mentioned (1 in our case), it will be plotted in green color else red color (negative impact) i.e. this feature is against the label.

Let’s see the output for one positive & one negative sample

For label=0

And for label=1

A few insights that can be drawn from the above plots

  • The model looks in fine touch as while interpretation, we can see the negative sample is inclined more towards 0 and the positive class towards 1.
  • The x-axis represents each feature's weight (as discussed while explaining LIME). Looks like D followed by A has the highest influence on the model
  • B has no influence whatsoever.

LIME has its extension for interpreting models for images & text as well. Do check out the documentation

Before we wrap up, we must know a few limitations of LIME

  • If you remember, the artificial dataset generated was using mean & std for numerical features, hence assumes each feature follows Gaussian distribution which may not be true always
  • The quality of results heavily depends on the weights assigned to the features (by either kernel width or distance from the random sample picked).

With this, it's a wrap.

--

--