BASIC XAI with DALEX — Part 3: Partial Dependence Profile

Anna Kozak
Published in
4 min readNov 15, 2020


Introduction to model exploration with code examples for R and Python.

By Anna Kozak

Welcome to the “BASIC XAI with DALEX” series.

In this post, we present the Partial Dependence Profile (PDP), the model agnostic method, which is the one we can use for any type of model.

In the first part of this series, you can find here BASIC XAI with DALEX — Part 1: Introduction.

In the second part of this series, you can find here BASIC XAI with DALEX — Part 2: Permutation-based variable importance.

So, shall we start?

First — What PDP delivers to us?

We continue to explore the model, we look at global explanations. Earlier we talked about the method for variable importance based on permutation. Now we look at Partial Dependence Profile. The general idea behind the design of PD profiles is to show how the expected model prediction value behaves as a function of the selected explanatory variable. For one model we can construct a general PD profile using all observations from the data set or several profiles for observation subgroups. A comparison of subgroup-specific profiles can provide important insight, for example, into the stability of the model prediction.

  1. Profiles can be created for all the observations in the set, as well as for the division against other variables. For example, we can see how a specific variable behaves when differentiated by gender, race, or other factors.
  2. We can detect some complicated variable relationships. For example, we have PD profiles for two models and we can see that one of the simple models (linear regression) does not detect any dependence, while the profile for a black-box model (random forest) notices a difference.

Second — Idea of Partial Dependence Profile

The value of a Partial Dependence profile for model f() and explanatory variable Xj at z is defined as follow:

Thus, it is the expected value of the model predictions when variable Xʲ is fixed at z over the (marginal) distribution of X-ʲ, i.e., over the joint distribution of all explanatory variable other than j.

Generally, we do not know the true distribution of X-ʲ. We can estimate it, however, by the empirical distribution of n observations available in a training dataset.

Third — let’s get a model in R and Python

Let’s write some code. We are still working on the DALEX apartments data. To calculate and visualize PD profiles we use the model_profile() function with partial type. The generic plot function draws lines for numeric variables. To get profiles for the categorical variables in the model_profile() function we should use the variable_type = “categorical”.

Code to created model_profile object in Python and R

Now let’s see what PD profiles look like for the surface and construction.year. We can see that the larger the area of the apartment, the lower the price per square meter is. For the variable year of construction, the most expensive apartments are built before World War II and after 1990.


As we mentioned earlier, we can construct profiles against other variables (grouped partial dependence profiles).

For example, let’s see how the surface and construction.year will depend on the number of rooms in the apartment. Each color corresponds to the number of rooms in the apartment, respectively ranger_1 to one room, ranger_2 to two, and so on. What we can see is that the more rooms the price per square meter decreases. The biggest difference is for a property with one room.


We also talked about comparing PDP for different models (contrastive partial dependence profiles). Let’s build a linear regression model on the same data. The linear regression model does not capture the U-shaped relationship between the construction.year and the price. On the other hand, the effect of the surface on the apartment price seems to be underestimated by the random forest model. Hence, one could conclude that, by addressing the issues, one could improve either of the models, possibly with an improvement in predictive performance.

plot(rf_mprofile, lm_mprofile)

In the next part, we will talk about the Break Down method.

Many thanks to Przemyslaw Biecek and Jakub Wiśniewski for their support on this blog.

If you are interested in other posts about explainable, fair, and responsible ML, follow #ResponsibleML on Medium.

In order to see more R related content visit



Anna Kozak

Data Scientist | Data Visualization | Responsible Machine Learning