ML Model Interpretability — LIME

Mayur Sand
Analytics Vidhya
Published in
4 min readMay 10, 2020
Photo by Glen Carrie on Unsplash

Introduction

In my earlier article, I described why there is a greater need to understand the machine learning models and what are some of the techniques. I also explained in detail ELI5 in detail, how it can be used to get more insights into a model.

In this article, I will discuss in detail about another technique — LIME

LIME (Local Interpretable Model-Agnostic Explanations)

LIME was proposed as part of Paper “Why Should I Trust You?”: Explaining the Predictions of Any Classifier” published in August 2016 by Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin. It has library for both Python & R.

Intuition

Local: Explains why a single data point was classified as a specific class

Model-agnostic: Treats the model as a black-box. Doesn’t need to know how it makes predictions

All the deep learning models dealing with Big data deals with large number of features to predict accurately. The resultant n dimensional curve generated by deep learning model is very complex to understand & interpret.

LIME proposes to focus on interpreting locally instead of providing a global model interpretation. We can zoom in on a datapoint in a model and then can find in detail which features impacted the model to reach certain conclusion.

LIME is model agnostic

LIME does not care about the ML model being used by you (now or in future). It will treat it as blackbox and will focus on interpreting the local result.

How LIME Works

For a given observation :

  1. Permute data -Create new dataset around observation by sampling from distribution learnt on training data (In the Perturbed dataset for Numerical features are picked based on the distribution & categorical values based on the occurrence)
  2. Calculate distance between permutations and original observations
  3. Use model to predict probability on new points, that’s our new y
  4. Pick m features best describing the complex model outcome from the permuted data
  5. Fit a linear model on data in m dimensions weighted by similarity
  6. Weights of linear model are used as explanation of decision

How to use this library in python

We will be using Bank Marketing Data Set — LINK . All the data analysis steps are mentioned in in previous article. Using this dataset , we will build models for Linear Regression , Decision Tree , Random Forest & Light GBM (refer code on GitHub).

After creating our models , we will start using LIME by instantiating a new explainer using the LimeTabularExplainer. The Arguments are

  • your training data with categorical values converted to LIME format
  • specify a mode Regression or Classification
  • feature_names: a list of names for your columns
  • categorical_names: our dictionary that maps categorical features to their possible values
  • categorical_features: list of index of categorical features
explainer = LimeTabularExplainer(
convert_to_lime_format(X_train,categorical_names).values,
mode="classification",
feature_names=X_train.columns,
categorical_names=categorical_names,
categorical_features=categorical_names.keys(),
random_state=42)

We will pick one row to interpret the results

i = 4
print(X_test.iloc[i])
Row selected for Local Interpretation

Now we will run interpretation on this row for each of our 4 models

explanation = explainer.explain_instance(observation, lr_predict_proba, num_features=5)explanation.show_in_notebook(show_table=True, show_all=False)
Interpretation of Factors from Logistic Regression

The central plot shows the contributions of the top features to the prediction corresponding to the weights of linear model. So for this customer , the value of subscribed is true as he was contacted by phone in month of June . The fact that he was approached more than 3 times in this campaign could have been a negative factor .

LIME is fitting a linear model on a local dataset. One can find coefficients, the intercept and the R squared of the linear model by calling following functions

print(explanation.local_exp)
print(explanation.intercept)
#R2 score
print(explanation.score)
Linear Model information

One can notice the weights depicted in picture above as values of local_exp. The R2 score for Linear Regression model is quite bad.

We also tried LightGBM model and below are the results

LightGBM Results

Drawbacks of LIME

  1. Depends on the random sampling of new points, so it can be unstable
  2. Fit of linear model can be inaccurate
  3. It is slow with images processing

Conclusion

LIME is a good tool to understand the black box models . One does not require lot of effort to find details about the features impacting the model decision . It runs independent of the model being used today or in future . You can find the code on my GitHub

If you have any questions on LIME , let me know happy to help. Follow me on Medium or LinkedIn if you want to receive updates on my blog posts!

--

--