BASIC XAI

BASIC XAI with DALEX — Part 4: Break Down method

Anna Kozak
Nov 29, 2020 · 4 min read

Introduction to model exploration with code examples for R and Python.

Image for post
Image for post
By Anna Kozak

Welcome to the “BASIC XAI with DALEX” series.

In this post, we present the Break Down method, the model agnostic method, which is the one we can use for any type of model.

Previous parts of this series are available:

So, shall we start?

First — What it is Break Down?

The previous methods we discussed concerned the global exploration of models. We looked at the importance of variables and Partial Dependence Profiles. Now, we move on to local explanations, i.e., those concerning a specific observation. We may consider real estate, patient, bank, or telecommunication client. The Break Down method is one of the local explanations; it indicates for a selected observation a contribution of variables into models’ prediction.

Second — Idea of Break Down

The basic idea is to calculate the contribution of variable in prediction of f(x) as changes in the expected model response given other variables. This means that we start with the mean expected model response of the model, successively adding variables to the conditioning. Of course, the order in which the variables are arranged also influences the contribution values. If our model is additive, the arrangement of individual variables and values will be the same. If we have a non-additive model with p variables, we have p! orders, it is complicated by calculation.

How does it calculate in Break Down method?

Let’s take one observation from the apartments dataset.

m2.price 5897
construction.year 1953
surface 25
floor 3
no.rooms 1

We start from the average value of the prediction, i.e. for each observation in the set we calculate the prediction and then average it. In the next step we select all observations for which the construction.year variable takes the value 1953. For these variables we calculate the average prediction. The difference between this value and the average prediction for the model is the contribution of the variable construction.year. Then, from the apartments built in 1953, we select those that have a surface area of 25 and, similarly, we calculate the average prediction for these observations. At this point, we already determine on two variables, for other variables we proceed the same way.

Image for post
Image for post
The first row shows the distribution and the mean value (red dot) of the model’s predictions for all data. The next rows show the distribution and the mean value of the predictions when fixing values of subsequent explanatory variables. The last row shows the prediction for the particular instance of interest.
Image for post
Image for post
Red dots indicate the mean predictions in the plot above.
Image for post
Image for post
The green and red bars indicate, respectively, positive and negative changes in the mean predictions (contributions attributed to explanatory variables).

The introduction more formally the Break Down method you can find in Biecek, P. and Burzykowski, T. Explanatory Model Analysis.

Third — let’s get a model in R and Python

Let’s write some code. We are still working on the DALEX apartments data. To calculate the Break Down method we use the predict_parts() function with type = ‘break_down’. We need the explainer object and the observation for which we want to calculate the explanation.

Code to created Break Down predict_parts object in Python and R

Let’s see now on the plot for this apartment. The biggest influence on the price of the apartment has the “Ochota” district, it is close to the city center. However, the price is negatively impacted by the fact that the apartment is not in the “Srodmiescie” district — city center. Moreover, the floor number equal 7 and the 93 meter squared area have a negative contribution to price.

Image for post
Image for post
Break Down plot for observation apartment and random forest model. From the top, a vertical line represents the average response of the model, the green and red bars correspond to the contribution of the variable to the prediction. The red ones their negative values, i.e. decrease the prediction value. The violet bar corresponds to the prediction values for the observation. The numerical values next to the bars inform about the impact. On the x-axis we have model prediction value, on the y-axis, we have variables and ther values for the observation.

In the next part, we will talk about Shapley values method.

Many thanks to Przemyslaw Biecek and Hubert Baniecki for their support on this blog.

If you are interested in other posts about explainable, fair, and responsible ML, follow #ResponsibleML on Medium.

In order to see more R related content visit https://www.r-bloggers.com

ResponsibleML

Having Fun while building Responsible ML models

Anna Kozak

Written by

Data Scientist & Research Software Engineer. Interested in modeling, analysis, explainable artificial intelligence, and data visualization.

ResponsibleML

Tools for Explainable, Fair and Responsible ML.

Anna Kozak

Written by

Data Scientist & Research Software Engineer. Interested in modeling, analysis, explainable artificial intelligence, and data visualization.

ResponsibleML

Tools for Explainable, Fair and Responsible ML.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store