GPU-Accelerated SHAP values with XGBoost 1.3 and RAPIDS

Rory Mitchell
RAPIDS AI
Published in
5 min readJan 12, 2021

TL;DR — With the release of XGBoost 1.3 comes an exciting new feature for model interpretability — GPU accelerated SHAP values. SHAP values are a technique for local explainability of model predictions. That is, they give you the ability to examine the impact of various features on model output in a principled way.

SHAP and GPUTreeSHAP Background

SHAP at its core describes the average impact from adding a feature to a model, but does it in a way that attempts to account for all possible subsets of the other features as well. While there are many possible methods of describing feature relevance, SHAP leverages ideas from game theory to guarantee useful mathematical properties such as efficiency and monotonicity. Efficiency states that the total attributions overall features add up to the model prediction — this gives SHAP value a natural interpretation, as they effectively break down a prediction into its component parts. Monotonicity states that if the model changes such that a feature increases in relevance, its attribution must not decrease. See [1] for in-depth reading on SHAP values as applied to decision tree ensembles.

SHAP values have been available in XGBoost for several versions already, but 1.3 brings GPU acceleration, reducing computation time by up to 20x for SHAP values and 340x for SHAP interaction values. This is powered under the hood by RAPIDS GPUTreeShap, which offers portable CUDA C++ implementations of SHAP algorithms for decision tree models. GPUTreeSHAP uses a novel, massively-parallelized approach based on flattening and reordering millions of tree paths from an ensemble and mapping to thousands of CUDA cores, where dynamic programming problems are solved using special hardware instructions. This algorithm is explained in detail in [3].

SHAP in Practice

Let’s take a look at some basic examples using XGBoost and SHAP values to identify key features and feature interactions in the California housing dataset. This is a famous dataset of house prices and attributes in California from the 1990 Census, available via scikit-learn.

XGBoost 1.3 with GPU acceleration can be installed via

pip install xgboost

Or by installing version 0.17 of RAPIDS, which includes XGBoost 1.3.0 along with other GPU-accelerated libraries for data science. (See the RAPIDS Getting Started page for conda and container-based instructions.)

The following snippet fetches the dataset, trains an XGBoost regression model with 500 trees (using GPU acceleration), and plots the distribution of predictions on the training set.

Now we make sure GPU accelerated prediction is enabled and generate the SHAP values of the training set

SHAP time 0.16751956939697266

Computing all SHAP values takes only ~0.17s using a V100 GPU compared to 2.64s using 40 CPU cores on 2x Xeon E5–2698, a speedup of 15x even for this small dataset.

‘shap_values’ now contains a matrix where each row is a training instance from X and the columns contain the feature attributions (i.e. the amount that each feature contributed to the prediction). The last column in the output shap_values contains the ‘bias’ or the expected output of the model if no features were used. Each row always adds up exactly to the model prediction — this is a unique advantage of SHAP values compared to other model explanation techniques.

Model predictions can be inspected individually using this output, or we can aggregate the SHAP values to gain insight into global feature importance. Here we take the mean absolute contribution of each feature and plot their magnitude.

The above chart shows that latitude, longitude, and median income are the most relevant features to our XGBoost model. This chart shows first-order feature relevance, however, we can go deeper and search for second-order effects, or interactions between pairs of features. In the past, it was extremely expensive to search for second-order effects. Now with GPUTreeShap we can compute these interaction effects in a matter of seconds, even for large datasets with many features.

SHAP interactions time 1.3384981155395508

Computing all interactions takes only ~1.33s on our V100 where 40 CPU cores take ~40.12s, a speedup of 30x. This is a relatively small dataset with only 8x8 possible feature interactions. For larger datasets, as shown in our paper, GPUTreeShap can reduce feature interaction computations from days to a matter of minutes.

The output ‘shap_interactions’ contains a symmetric matrix of interaction terms for each row, where the element-wise sum evaluates to the model prediction. The diagonal terms represent the main effects for each feature or the impact of that feature excluding second-order interactions.

As before we can aggregate interactions to examine the most significant effects over the training set. This time we plot only the top-k effects due to a large number of possible combinations.

Conclusion

Here we see that the SHAP algorithm has identified Latitude-Longitude as the strongest interaction effect by a large margin. While this outcome is somewhat intuitive for the task of house price prediction, given our a priori knowledge of the relatedness of latitude and longitude, the SHAP interactions are useful in algorithmically verifying if the model is behaving according to our human intuition. In cases where the meaning or relationships between features is unclear, it can be used to search for interactions that are not immediately obvious.

Next Steps

XGBoost 1.3.0 is just the first release with GPUTreeSHAP, with many additional features planned for the future. Look out for future articles on GPU acceleration for SHAP, including our integration with the popular Python shap package, which extends this work to a broader range of tree models such as LightGBM, sklearn random forests, and CatBoost.

References

[1] Lundberg, Scott M., et al. “From Local Explanations to Global Understanding with Explainable AI for Trees.” Nature Machine Intelligence 2.1 (2020): 2522–5839.

[2] Chen, Tianqi, and Carlos Guestrin. “XGBoost: A Scalable Tree Boosting System.” Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016.

[3] Mitchell, Rory, Eibe Frank, and Geoffrey Holmes. “GPUTreeShap: Fast Parallel Tree Interpretability.” arXiv preprint arXiv:2010.13972 (2020).

--

--

Rory Mitchell
RAPIDS AI

Senior Software engineer at Nvidia and XGBoost maintainer