Introduction to model exploration with code examples for R and Python.
Welcome to the “BASIC XAI with DALEX” series.
In this post, we present the LIME (Local Interpretable Model-agnostic Explanations) method, the model agnostic method, which is the one we can use for any type of model.
Previous parts of this series are available:
- BASIC XAI with DALEX — Part 1: Introduction.
- BASIC XAI with DALEX — Part 2: Permutation-based variable importance.
- BASIC XAI with DALEX — Part 3: Partial Dependence Profile
- BASIC XAI with DALEX — Part 4: Break Down method
- BASIC XAI with DALEX — Part 5: Shapley values
So, shall we start?
First — Idea of LIME method
The LIME method was originally proposed by Marco Ribeiro in 2016. The key idea of this method is to bring the black box model locally closer using a simpler glass box model that is easier to interpret. Figure shows the idea behind LIME. The violet and light gray areas correspond to decision regions for a binary classification model. The big black dot corresponds to the instance of interest x₊. Other dots indicates the generated new data. The dashed line corresponds to a simple linear model fitted for the artificial data. It approximates the black-box model around the instance of interest. The simple linear model ,,explains’’ the local behavior of the black-box model.
The objective is to find a local model M that approximates a black box model f around the point of interest x₊. We can write this as
We are looking for a local model g from the class G of interpretable models for an instance x₊. The model shall be simple so we add a penalty for model complexity measured as Ω(g). The white-box model g shall approximate well the black-box model f locally, where Πx₊ denotes the neighborhood of x₊. The L stands for some goodness of fit measure. The functions f and g may work on different data. The black-box f(x): X → R works on original feature space X when the glass-box function g: X’ → R usually works on an interpretable feature space X’.
The algorithm may be used to find an interpretable surrogate model that selects K most important interpretable features.
Second — let’s get a model in R and Python
Let’s write some code. We are still working on the DALEX apartments data. To calculate the LIME method we use the lime package R and predict_surrogate() module in dalex (before use install lime package). Lime packages may have different implementations, so the results may not be the same.
Let’s see the LIME explanation from R package. As we can see in Figure below the highest positive influence has district_Ochota variable. The next variable with high influence is no.rooms. Other variables have a small influence on predictions. As we mentioned in the previous blog, the Ochota district is near to City Centre, so have a huge impact.
Now let’s look at the explanation LIME with the Python package. The most important impact on the prediction is the distric_Srodmiescie variable, it is a negative attribution. Additionally, the variable surface also has a negative influence on the prediction value. The other variables have no significant impact.
In the next part, we will talk about the Ceteris Paribus profile.
If you are interested in other posts about explainable, fair, and responsible ML, follow #ResponsibleML on Medium.
In order to see more R related content visit https://www.r-bloggers.com