Towards a comparable metric for AI model interpretability — Part 1

Published in

AI Practice and Data Engineering Practice, GovTech

8 min readJul 7, 2022

By: Howard Yang (GovTechSG), Jessica Foo (GovTechSG)
With inputs from: Dr. Matthias Reso (Meta)

This post is part 1 of a two-part series on explainable AI (XAI). Part 1 (this article) introduces XAI and its common methods, while part 2 (found here) will focus on our experimentations with Captum in developing comparable metrics for explainability and fairness in the realm of computer vision.

AI adoption has drastically increased in recent years and is impacting greater segments of our society in increasingly significant ways. This adoption growth is driven largely by its increased capabilities, but with this increase comes more complexity and less interpretability (see figure below). As such, the AI is often described as a “black box”.

Graph of interpretability against accuracy of common AI algorithms and architectures. Source: Captum

Convolutional neural networks (CNNs) are used in deep learning with great success. However, CNNs are not intuitive or easily interpretable. This lack of understanding becomes a liability when AI is being used in high-stakes decision-making, such as facial recognition, loan approval, and social assistance fraud detection. Both solution providers and AI engineers are often unable to explain exactly how the AI arrived at its prediction. Hence, people interested in understanding how an AI model works would need to pry open the AI model to expose its inner workings and elucidate its decision-making process.

The field of Explainable AI (XAI) grew from this need. XAI includes algorithms and tools to produce easy-to-interpret post-hoc explanations that help practitioners, decision-makers, consumers, and other stakeholders to better understand how and why an AI model came to a certain prediction.

Two images. The left is an image of a husky with a snowy background that has been wrongly classified as a wolf, and the right is an image of the explanation generated that only shows the snowy background. — A *post hoc* explanation model reveals that the black box model is distinguishing huskies from wolves based on a spurious correlation (the presence of snow). Source: “Why Should I Trust You?” Explaining the Predictions of Any Classifier.

For example, a CNN may erroneously be using the presence of snow to distinguish between a husky and a wolf in the image (a) above. In this case, the background would have a higher activation compared to the dog and the AI model wrongly recognises the white snowy background more than the dog itself. Developers are able to observe this error with the explanation (image (b) above), and rectify the AI model before sending it into production.

Applications of XAI

XAI is useful to a variety of stakeholders, from practitioners to decision-makers to auditors. Beyond providing explanations for development purposes, other important applications of XAI include regulatory compliance and adversarial attack protection.

The best understood and most utilised area of XAI is individual prediction (e.g., why an applicant got rejected for a loan). Techniques such as LIME and SHAP (explained in the later section of this blog) can provide quantitative answers to this question. For data such as images, audio, and free-form text, similar results can be obtained using attribution maps. The attribution maps visualise the “attention” of the models to show “how they work”.

Attribution Map obtained using Integrated Gradients. Source: Meta, Captum

Such attribution maps can then be used for model debugging or model improvement, allowing practitioners to perform a sanity check on whether the model decision-making process is sensible. These maps can also provide actionable insights on which part of the model to fine-tune if the explanations are unacceptable.

XAI also comprises fairness metrics (e.g., False Positive Rate (FPR), False Negative Rate (FNR), equalised odds, counterfactual fairness) which unveil underlying undesirable biases that may have unexpectedly made their way into the model. Models are trained on historical data and hence, can only be as good as the decisions from the past. Biased models, if deployed blindly, can systematically reinforce existing biases. XAI allows developers and decision-makers to quantify the level of bias in a model, detecting unfairness before it is deployed.

Besides developers, decision-makers can use the metrics of model explainability and fairness in addition to metrics such as accuracy, recall, precision and F1-score to aid procurement decisions. Auditors may also utilise fairness metrics to ensure compliance and accountability for sensitive attributes to ensure that decisions made by the model are as fair as possible.

We can well see the reasons why XAI is important and how its results can be used. Now, let us explore some common XAI methods.

Overview of XAI Methods

There are two major groups of models used in XAI: white-box and black-box. White-box models are explainable by design. Examples include hand-crafted expert systems and rule-based learning systems like inductive logic programming and decision trees. Black-box models, on the other hand, require additional algorithms to extract the model’s inner logic. These algorithms can largely be categorised into perturbation- and gradient-based methods, with some algorithms straddling the two categories.

Overview of common XAI algorithms. Source: Captum

Perturbation-based algorithms

Perturbation-based algorithms make use of small perturbations of individual instances to generate their results in the form of interpretable local approximations, such as a linear model or attribution map. The outputs can then be used to explain individual predictions of black-box models. Explanations generated using perturbation-based techniques may not be robust, and the resulting explanations may change drastically with very small changes to each instance. They are also generally non-deterministic as the perturbations are randomly generated (e.g., sampled from a standard normal distribution). Two popular perturbation-based algorithms are LIME and SHAP.

LIME

Local Interpretable Model-agnostic Explanations¹ (LIME) is one of the most popular interpretability methods for black-box models. LIME trains an interpretable linear surrogate model by randomly sampling data points in the neighbourhood of a given input instance. By interpreting this local model, the initial black box model is consequently interpreted. Although LIME is powerful and straightforward, it has its drawbacks. In 2020, the first theoretical analysis of LIME² was published by Garreau, Damien, and Ulrike Luxburg, validating the significance and meaningfulness of LIME, but also proving that poor choices in terms of parameters could lead to LIME missing out on important features.

SHAP

SHapley Additive exPlanations³ (SHAP) is an algorithm that computes the importance values for each feature in an individual input instance based on cooperative game theory. The method takes each permutation of the input features and adds them one-by-one to a given baseline, generating a new output following each addition. The difference in output after each addition corresponds to the feature’s contribution. The differences are then averaged over all permutations to obtain the attribution. SHAP maintains three desirable properties: local accuracy, missingness, and consistency. However, SHAP values are often approximated due to the computationally intensive exhaustive search section of the algorithm. This approximation can reduce the mathematical robustness of the proofs since we are not calculating the SHAP values directly.

Gradient-based algorithms

Gradient-based algorithms use gradients computed at individual instances to explain predictions of complex models. Often, the interpretation results are represented as attribution maps which can be overlaid on the original instances to explain the prediction. These explanations are typically more robust and deterministic. However, gradient-based methods may not generate interpretations that are faithful to the underlying models, meaning they might not perfectly reflect the underlying model decision-making process. Some of the most popular algorithms include Integrated Gradients, Guided Backpropagation and Deconvolution.

Integrated Gradients

Integrated Gradients⁴ is a method that attributes the predictions made by a deep neural network to its inputs. It does this by calculating the integral of the gradients with respect to the inputs along the path from a given baseline (e.g., all zeros) to the input. This variation upon the simpler Gradients method allows Integrated Gradients to satisfy the much-desired property of completeness (also known as Efficiency or Summation to Delta): the attributions sum up to the target output minus the baseline output. This approach also improves on sensitivity and implementation invariance compared to other commonly used attribution methods.

Formally, it can be described with the following mathematical formula:

Integrated Gradients along the i-th dimension of input X. Alpha is a scaling coefficient. Source: Meta, Captum

Guided Backpropagation and Deconvolution

Guided Backpropagation and Deconvolution are two very similar algorithms that also compute the gradient of the target output with respect to the input, but with the backpropagation of ReLU functions overridden. This means that only non-negative gradients are backpropagated. Guided Backpropagation and Deconvolution differ on a number of points, but first, let us explore what the latter is before explaining their differences.

Deconvolution⁵ utilises De-convolutional Networks (also known as DeconvNets or DCNNs) which use the same components as regular CNNs, such as filtering and pooling, but in reverse fashion: instead of mapping pixels to features, they map features to pixels. A DeconvNet is attached to each of CNN layers, providing a continuous path from the model prediction back to image pixels. This allows us to calculate the attributions of the image pixels with respect to the model prediction.

Guided Backpropagation⁶, also known as guided saliency, is a variant of Deconvolution where the max-pooling layers of convolutional neural networks for small images are replaced by a convolutional layer with increased stride. Moreover, the ReLU function is applied to the input gradients in Guided Backpropagation, while in Deconvolution, the ReLU function is applied to the output gradients and directly backpropagated.

Both approaches were proposed in the context of a convolutional network and are generally used for convolutional networks, although they can be applied generically to other model architectures.

Conclusion

We have come to the end of our introduction to XAI and its common methods. We shared the rationale behind the growing interest in XAI, where it can be applied, as well as the popular methods known today. What we have discussed in this post is definitely not exhaustive, for the field is constantly evolving at a rapid pace.

Taking the knowledge of these common methods, let us dive into XAI in computer vision in the next part of this two-part series. In the next post, we will share about the challenges in the computer vision space, and our experimentations in determining comparable metrics for model explainability and fairness.

Click here for part 2!

¹ Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. “ Why should I trust you?” Explaining the predictions of any classifier.” Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 2016.

² Garreau, Damien, and Ulrike Luxburg. “Explaining the explainer: A first theoretical analysis of LIME.” International Conference on Artificial Intelligence and Statistics. PMLR, 2020.

³ Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 4765–4774

⁴ Sundararajan, M.; Taly, A.; Yan, Q. Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; Volume 70, pp. 3319–3328

⁵ Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional networks. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2014; pp. 818–833.

⁶ Springenberg, J.; Dosovitskiy, A.; Brox, T.; Riedmiller, M. Striving for Simplicity: The All Convolutional Net. In Proceedings of the ICLR (Workshop Track), San Diego, CA, USA, 7–9 May 2015.