BASIC XAI
BASIC XAI with DALEX — Part 2: Permutation-based variable importance
Introduction to model exploration with code examples for R and Python.
Welcome to the “BASIC XAI with DALEX” series.
In this post, we present the permutation-based variable-importance, the model agnostic method, which is the one we can use for any type of model.
The first part of this series you can find here BASIC XAI with DALEX — Part 1: Introduction.
So, shall we start?
First — Why do we need the importance of variables in the model?
When building a model, we often ask ourselves the question — Which variables are the most important? What to pay attention to? When modeling the price of a property, we would like to know what so much impact on the price, whether it is the area or maybe the year of construction? When modeling the credit risk, we consider what influenced the fact that customers do not get a loan.
Among the methods of variables importance evaluation, we can distinguish a few methods for specific groups of models. For example:
- linear models
We can easily identify which attributes are important in additive models such as linear and logistic regression. They are just model coefficients.
- tree-based models
We can use method based on uses a calculation of the Gini impurity for each tree, then calculate an average. We can compare relative importance of averaged Gini impurity scores.
- random forest
We can use method based on out-of-bag data.
- other models
Even though some models offer model-specific measures for variable importance, we cannot compare importances between model structures. To overcome this problem we can use a model agnostic method, that is, one that works independently on the structure of a model. An example of such measure is the permutation-based variable-importance.
Second — Idea of permutation-based variable-importance
The idea is very simple, to assess how important is the variable V we will compare the initial model with the model on which effect of the variable V is removed. How to remove the effect of variable V? In the LOCO (leave-one covariate out) approach we retrain the model without variable V. But in permutation-based variable-importance we take a different approach, the effect of a variable is removed though a random reshuffling of the data. Following the picture below, we take the original data (left part of the picture), then we permutate (a mixer, which mixes the values), and we get “new” data, on which we calculate the prediction.
If a variable is important in a model, then after its permutation the model prediction should be less precise. The permutation importance of a variable i is the difference between model prediction for original data and prediction for data with permutation variable i:
where L_{org} is the value of the loss function for the original data, while L_{perm} is the value of the loss function after permutation of the i-th variable. Note that we can use a loss function of some function used for assessment of the performance, like AUC (which is not a loss function but is a popular measure for performance).
Third — let’s get a model in R and Python
Let’s write some code. We are still working on the DALEX apartments data. To calculate the permutation variable importance we use the model_parts() function. We can calculate the validity of the variables by considering only one permutation, but it is recommended to repeat it several times and average the results. By default, in the model_parts() function, we have 10 times of permutation. Additionally, we should take into consider the loss function, the default is 1-AUC for classification and RMSE for regression.
The data frame of model_parts object, we have variables and the average change after permutations.
Below is a plot that summarizes permutation-based variable-importance. It is worth notice that the bars start in RMSE value for the model on the original data (x-axis). The length of the bar corresponds to the RMSE loss after permutations. The boxplots show how the for random permutations differ.
As we see the most important variable is district_Srodmiescie, the higher price of the property depends on the city district. This should not come as a surprise, because it is a district in the center of Warsaw.
In the next part, we will talk about Partial Dependence Profiles (PDP) method.
Many thanks to Przemyslaw Biecek and Hubert Baniecki for their support on this blog.
If you are interested in other posts about explainable, fair, and responsible ML, follow #ResponsibleML on Medium.
In order to see more R related content visit https://www.r-bloggers.com