Exploring LIME (Local Interpretable Model-agnostic Explanations) — Part 1

Sze Zhong LIM
Data And Beyond
Published in
8 min readJan 6, 2024

Limes, in their small, vibrant form, mirror life in unexpected ways. Much like the sourness that transforms into a burst of refreshing flavor, life often presents challenges that, when overcome, reveal hidden joys and lessons.

Photo by Vino Li on Unsplash

Many who haven’t heard of LIME (Local Interpretable Model-agnostic Explanations), have probably heard of SHAP (SHapley Additive exPlanations). Both LIME and SHAP are part of the Explainable AI (XAI) realm. Before we dive into LIME, let’s know more about XAI.

According to Google,

Explainable AI is a set of tools and frameworks to help you understand and interpret predictions made by your machine learning models, natively integrated with a number of Google’s products and services. With it, you can debug and improve model performance, and help others understand your models’ behavior.

According to Datacamp,

Explainable AI refers to a set of processes and methods that aim to provide a clear and human-understandable explanation for the decisions generated by AI and machine learning models.

To put it into context, XAI encompasses various methods and techniques aimed at enchancing the interpretability of AI models. LIME, is a tool that aids in achieving this goal.

Why The Need For Interpretability?

To understand the significance of interpretability in AI, let’s draw a comparison with traditional models like linear regression. Considered basic in statistics and machine learning, linear regression boasts simplicity and transparency. With this model, the relationships between input features and the output are apparent; the coefficients directly display the weightages assigned to each feature. Consequently, the interpretability of such models is inherent — they provide clear insights into how changes in input variables influence the output.

However, the landscape has transformed drastically with the more complex models such as deep neural networks, ensemble methods, and boosted trees. While these models deliver unparalleled accuracy in various domains, they come with a trade-off — lack of interpretability. Their decision-making processes are obscured behind layers of intricate computations, rendering them inscrutable to human understanding.

Interpretability is required to mainly foster trust and detect errors and bias in the AI. Lets provide an example from the banking industry. Banks will employ complex machine learning models to analyze million if not billions of data points to assess the risk associated with each applicant. With the help of XAI techniques like LIME, the bank will be able to know how the model decided the individuals credit decision.

For example, a loan application is rejected. LIME could highlight the key factors influencing the decision of the model. This would offer additional interpretation to the Bank’s Relationship Manager who could then use this information to foster a better relationship with the customer. It could also lead to further fine tuning of the model, if the bank realizes that the model’s decision is not aligned with the expected behavior or is showing unnecessary bias. If the model was a ‘checker’ on the clients, XAI tools would be the ‘checker’ that checks on the ‘checker’.

Some materials for further reading up:
1) Journal of Risk and Financial Management — Explainable AI for Credit Assessment in Banks
2) Deloitte — Explainable AI Unleashes the Power of Machine Learning in Banking
3) Fintech Time — How XAI Looks to Overcome AI’s Biggest Challenges

XAI also aids in complying to regulatory compliance as well. In a 2016 story titled: Rejected for Credit? Newfangled scores may be to blame, a resident named Joseph who had an excellent FICO credit score of 820, was reported to be still rejected for a Bank of America credit card. Almost every party he contacted could not explain the model as to why he was rejected.

You may read up the article by Equifax, one of the top credit reporting agencies in the US here.

The Fair Credit Reporting Act (FCRA) governs the collection, assembly and use of consumer report information in the United States. Originally passed in 1970, the U.S. Federal Trade Commission (FTC) and the Consumer Financial Protection Bureau (CFPB) enforce it. Of particular interest in the review is the assignment of “key factors.” They are required, in many circumstances, to be provided by a consumer reporting agency to the user of a provided score or to a consumer who requests a score. Under the FCRA Section 609(f)(2)(B), key factors mean “all relevant elements or reasons adversely affecting the credit score for the particular individual, listed in the order of their importance based on their effect on the credit score.” This information should be of value to a consumer, providing guidance on what areas he/she could work to improve to achieve a higher score, and with that, a better chance of approval or better credit pricing.

In Singapore, the Monetary Authority Singapore (MAS), introduced the Veritas Initiative. You may read more about the initiative as below.

It comes with an interesting toolkit.

What is LIME and How Does It Work?

LIME starts by selecting a specific instance for which you want to understand the model’s prediction. It then generates a dataset of similar instances by perturbing or tweaking the features of the selected instance while keeping the label constant.

It is a Model Agnostic sampling, which means that LIME doesn’t require knowledge of the model’s internal working. It treats the model as a black box. As such, it will work on any model.

LIME will then fit an interpretable and simple ‘surrogate’ model, such as a linear model or decision tree, to the generated dataset of perturbed instances. The ‘surrogate’ model aims to approximate the behavior of the complex model around the selected instance.

The surrogate model’s coefficients or feature importances indicate the impact or contribution of each feature in influencing the prediction for that specific instance. This information helps interpret the importance and influence of different features on the model’s decision locally.

LIME vs SHAP

In short, LIME is much faster than SHAP, since Shapley values take a long time to compute. You may read up more about SHAP vs LIME as below:

Shapley values consider all possible predictions for an instance using all possible combinations of inputs. Because of this exhaustive approach, SHAP can guarantee properties like consistency and local accuracy….

LIME is actually a subset of SHAP but lacks the same properties…

Simply put, LIME is fast, while Shapley values take a long time to compute.

You can also check out an actual comparison done between SHAP vs LIME here.

At first sight, SHAP seems more versatile: it offers methods for both local and global interpretation of models, and there are multiple options with regard to visualization. In contrast, LIME only offers a method for local interpretation of models, and is more restricted in terms of visualization. So why would we ever use LIME? Well, if we look at the running time for SHAP, computing the SHAP values for a subset of only 5000 samples already took 3 minutes. Furthermore, if we look at the computation time for the SHAP waterfall plot, we see that it takes a very long time for the algorithm to compute the values that indicate the contribution of each variable to the output.

There are also arguments about why LIME should not be the XAI tool to be used. In short, the article takes issue with the kernel width and how local should the local explainability be.

I also found the article below very useful in going deeper into the difference between LIME / SHAP / Explainable Boosting.

LIME Code

There are a few packages out there that can carry out LIME. The ones I explored are:
1) marcotcr’s LIME
2) mmlspark explainers module

Marcotcr’s LIME

The LIME package by marcotcr can be found as below.

The package is on PyPI so you can simply run pip install lime and it would work. In the repo, you will be able to see some of the examples marcotcr has given.

You can also find his post about LIME below.

There are a few limitations for this code:
1) Linearity
2) Null Effect
3) Data Format

Linearity Issues
The local explanation model is a linear model, hence the explanation is subject to linearity. If the prediction is heavily non-linear, the explanation model will naturally be not as accurate.

Null Effect
The subsample is done only for those features which are non-null. As such, there is a possibility that it could miss an important null effect of a feature. An example would be that the null of feature X is actually the most important feature. However, due to the way the sampling is done, the null feature is not captured by the explanation model.

Data Format
One of the limitations that this code has is that he designed for it to work primarily with NumPy arrays, which can limit its direct compatibility with other data formats like pyspark.sql.DataFrame. This limitation arises from the package’s implementation, which expects an input and output of a specific format.

In the era of big data, working with pyspark.sql.DataFrame or other non-NumPy formats might require additional pre-processing or conversion steps to make the data compatible with LIME’s input requirements.

MmlSpark Explainer Module

mmlspark.explainers.TabularLIME is a package from Microsoft Machine Learning for Apache Spark. You may find the documentation for one of the versions here.

The github is known as SynapseML and you may find the code for TabularLIME from the below

I will be discussing about how I wrote a wrapper around marcotcr’s LIME for it to run a different data format, in Part 2 here. You may also find other resources from my navigational index here.

--

--