Fooling Partial Dependence via Data Poisoning

Published in

ResponsibleML

3 min readAug 8, 2021

--

TL;DR: We highlight that Partial Dependence can be maliciously altered, e.g. bent and shifted, with adversarial data perturbations.

Blog based on the preprint available at https://arxiv.org/abs/2105.12837, which is a joint work with Wojtek Kretowicz and Przemyslaw Biecek. Code is available at https://github.com/MI2DataLab/fooling-partial-dependence.

Explainable machine learning gives many promises for developers and auditors working with black-box predictive models. Alarmingly, recent studies show that many explanations are not trustworthy and can be manipulated in an adversarial manner. Hence, it is necessary to focus on evaluating post-hoc explainability the same way we critically assume to evaluate model performance.

In the paper, we present techniques for attacking Partial Dependence (plots, profiles, PDP), which are among the most popular methods of explaining any predictive model trained on tabular data. This is especially crucial in financial or medical applications where auditability became a must-have trait supporting decisions made by black-boxes. Partial Dependence, like many other explanations, is a function of model and data, both of which can be considered as a single point of failure of an explanation. We poison the data to bend and shift explanations in the desired direction using genetic and gradient algorithms, while the model remains unchanged.

Figure 1 presents an example of fooling Partial Dependence of age in the SVM model prediction of a heart attack. We observe two contradictory explanations of variable dependence while only 2 out of 12 variables have their distribution shifted. These results raise awareness of the importance of context in model explanations, e.g. a reference data distribution.

**Figure 1**: Partial Dependence of age in the SVM model prediction of a heart attack (class 0). **Left**: Two manipulated explanations suggest an increasing or decreasing relationship between age and the predicted outcome. **Right**: Distribution of the explained variable age and the two poisoned variables from the data, in which the remaining 10 variables attributing to the explanation are unchanged.

The primary motivation of this work is highlighting the potential of fooling Partial Dependence to conceal the model’s suspected behaviour, which various stakeholders might raise as an issue in model audit (Figure 2).

**Figure 2**: Framework for fooling model explanations via data poisoning. The red color indicates the adversarial route, a potential security breach, which an attacker may use to manipulate the explanation. Researchers may use this method to provide a wrong rationale explanation for a given phenomenon, while auditors may provide false evidence of the responsible machine learning use.

Moreover, one can use the introduced algorithms to evaluate the explanations with respect to a given dataset and predictive task. Figure 3 presents two fooling strategies: targeted attack changes the dataset to achieve the closest explanation result to the predefined desired function (Dombrowski et al., 2019; Heo et al., 2019); robustness check aims to achieve the most distant model explanation from the original one by changing the dataset, which corresponds to the sanity check (Adebayo et al., 2018). Both allow to assess the explanations’ limitations.

**Figure 3**: Fooling Partial Dependence of neural network models (rows) fitted to the friedman and heart datasets. We performed multiple randomly initiated gradient-based attacks on the explanations of variables X1 and age respectively. The blue line denotes the original explanation, the red lines are the explanations after the attack, and in the targeted attack, the grey line denotes the desired target.

In the paper, two optimization approaches are considered: genetic-based algorithm makes no assumptions about a black-box model, which makes it model-agnostic; gradient-based algorithm specifically designed for models with differentiable outputs, e.g. neural networks. Figure 4 presents another example of fooling Partial Dependence of sex in the SVM model prediction of a heart attack. One can consider both scenarios: manipulating the explanation to provide evidence of a suspected or concealed (unfair) model behaviour.

**Figure 4**: Partial Dependence of sex in the SVM model prediction of a heart attack (class 0). **Left**: Two manipulated explanations present a suspected or concealed variable contribution into the predicted outcome. **Right**: Distribution of the three poisoned variables from the data, in which sex and the remaining 9 variables attributing to the explanation are unchanged.

Overall, Partial Dependence is quite a complex function whose number of variables corresponds to data dimension NxP. This intuition entails a possibility of explanation manipulation via data poisoning, which should be considered in critical applications where proper evaluation is required for stable analysis.

We welcome feedback to this line of work. Find the algorithms’ details and a discussion on the results of experiments in the preprint on arXiv.

If you are interested in other posts about explainable, fair, and responsible ML, follow #ResponsibleML on Medium.

Fooling Partial Dependence via Data Poisoning

Written by Hubert Baniecki