How Is the Partial Dependence Plot Computed?

Chris Kuo/Dr. Dataman
Dataman in AI
Published in
4 min readApr 28, 2022

--

In linear regression, the relationship between the target and a feature can be easily observed by the sign of the coefficient. How can we discover the relationships between the target and variables in a machine-learning model? The solution is the Partial Dependence Plot (PDP). It shows the marginal effect that one or two features have on the predicted outcome. It shows whether the relationship between the target and a feature is linear, monotonic, or more complex. It is introduced by J. H. Friedman and has been widely applied in machine learning modeling.

In the series of posts on the SHAP value, I’ve shown you how to plot the partial dependence plot. This post wants to add some flavor as to how it is calculated. I will start with the definition of the Partial Dependence function, then explain how it is calculated.

The definition of the partial dependence function is:

The machine learning model is f(Xs,Xc). The Xs are the features to be plotted against the target to show the relationships. The Xc are other features and n is the number of samples in the dataset.

Let me explain the PDP step by step and illustrate it in Figure (1). Assume a machine learning training dataset has three features (X1, X2, X3) and its target variable Y. See Step 0 in Figure (1). We are interested in the PDP to show…

--

--