Explainability in Question Answering

5 min readOct 27, 2022

The ubiquitous adoption of machine learning models is making regulators start preparing policies for artificial intelligence with special emphasis on explainability to ensure that these models are fair and trustworthy.

A European approach to artificial intelligence

The way we approach Artificial Intelligence (AI) will define the world we live in the future. To help building a…

digital-strategy.ec.europa.eu

For example, if you are using a question-answering (QA) system for medical domains, the doctor would require the diagnosis from the model along with an explanation. A diagnosis alone is not useful and it can even be dangerous in case the machine made a mistake! However, neural networks are black boxes that give predictions but are not interpretable. Hence, many researchers have worked to provide methods to explain these models.

Jay Alammar created a cheat sheet to explain the different types of explainable methods for AI. Some methods like SHAP are model agnostic, i.e., it works in all types of models, not only neural networks. Others are model specific, for instance, attention-based methods only work with models using Attention like BERT. Lastly, others are based on the use of examples to analyze how the model behaves. In this article, we want to explain how Saliency Methods can be used to explain QA systems.

Explainable AI Cheat Sheet

Introducing the Explainable AI Cheat Sheet, your high-level guide to the set of tools and methods that helps humans…

jalammar.github.io

Saliency Maps

Saliency Maps assign an attribution weight to the input tokens to assess their importance in the model prediction. In UKP-SQuARE, we use two families of attribution methods to construct saliency maps: i) Gradient-based methods and ii) Attention-based methods.

Gradient-based Methods

Gradient-based methods compute the gradients on the embedding layer against the model prediction. The magnitude of the gradient corresponds to the change in the prediction when updating the embedding. Therefore, a large gradient has a large effect on the prediction, indicating the importance of the input. The figure below illustrates this phenomenon. The gradients of “problem,” “getting,” and “license” are larger than “what,” “was,” “with,” and “marriage.” These highlighted words also correspond with the human interpretation of important words to understand the question.

A large gradient has a large effect on the prediction. Thus, it indicates the importance of the input.

Fig 1. UKP-SQuARE’s Saliency Map. Highlighted tokens are the most important for the prediction.

Following this family of methods, UKP-SQuARE implements the following three:

Vanilla Gradient utilizes the plain gradients of the embedding layer of the model as importance weights of the inputs.

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

This paper addresses the visualisation of image classification models, learnt using deep Convolutional Networks…

arxiv.org

Integrated Gradient integrates the straight line path from the vector of zeros to the input token embedding. The value of this integral is the weight of this token to make the prediction since it represents the amount of information given with respect to the zero vector (i.e., no information).

Axiomatic Attribution for Deep Networks

Mukund Sundararajan, Ankur Taly, Qiqi Yan Proceedings of the 34th International Conference on Machine Learning, PMLR…

proceedings.mlr.press

SmoothGrad adds gaussian noise to the input to create multiple versions and then average their saliency scores. In this way, this method can smooth the saliency scores and alleviate noise from local variations in the partial derivatives.

SmoothGrad: removing noise by adding noise

Explaining the output of a deep network remains a challenge. In the case of an image classifier, one type of…

arxiv.org

Attention-based Methods

Neural NLP models have broadly incorporated attention mechanisms, which are frequently recognized for enhancing transparency and increasing performance (Vaswani et al., Neurips 2017). These methods compute a distribution over the input tokens that can be considered to reflect what the model believes to be important.

Following Jain et al., (2020), UKP-SQuARE builds a saliency map using the average attention weights of the heads from the CLS token to the other tokens of the input. However, Serrano and Smith (2019) argue that attention weights are inconsistent and may not always correlate with the human notion of importance. Thus, they propose an alternative, Scaled Attention, also integrated into UKP-SQuARE, that multiplies the attention weights by their corresponding gradients to make it more stable.

Learning to Faithfully Rationalize by Construction

Sarthak Jain, Sarah Wiegreffe, Yuval Pinter, Byron C. Wallace. Proceedings of the 58th Annual Meeting of the…

aclanthology.org

Is Attention Interpretable?

Sofia Serrano, Noah A. Smith. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics…

aclanthology.org

Attention methods compute a distribution over the input tokens that can be considered to reflect what the model believes to be important.

Saliency Maps in UKP-SQuARE

UKP-SQuARE is an online platform for Question Answering (QA) research that provides an ecosystem to analyze and compare QA models. It also provides a user-friendly interface for creating saliency maps. After running your model with your desired input, you will see a button on the bottom-right corner called “Explain this output” as shown in the figure below. Clicking this button will open a modal box with the Saliency Map interface.

The “Explain this output” button opens a modal box with the Saliency map interface.

Currently, UKP-SQuARE creates saliency maps using multiple attribution methods: Vanilla Gradient, Integrated Gradient, SmoothGrad, Attention, and Scaled Attention. It is also compatible with extractive QA, multiple-choice, and boolean QA models.

Saliency Map for a Multiple-Choice Model.

Lastly, it also allows comparing the saliency maps of multiple models to analyze whether the models truly understand the inputs or not.

The second model may not understand the input correctly since it identifies “because,” “order,” and “.” as important words.

Conclusions

UKP-SQuARE is an online platform for question-answering research that allows users to create saliency maps on the cloud without running a single line of code. It supports multiple types of Question-Answering systems such as extractive, multiple-choice, and boolean QA, and it eases the analysis and comparison of QA models through these saliency maps.

UKP-SQuARE is available at:

UKP-SQuARE

Software for Question Answering Research

square.ukp-lab.de

You can see the code on Github:

GitHub - UKP-SQuARE/square-core: SQuARE: Software for question answering research.

Flexible and Extensible Question Answering Platform SQuARE is a flexible and extensible Question Answering (QA)…

github.com

For more information, please check our publications:

UKP-SQUARE: An Online Platform for Question Answering Research

Tim Baumgärtner, Kexin Wang, Rachneet Sachdeva, Gregor Geigle, Max Eichler, Clifton Poth, Hannah Sterz, Haritz Puerto…

aclanthology.org

UKP-SQuARE v2: Explainability and Adversarial Attacks for Trustworthy QA

Question Answering (QA) systems are increasingly deployed in applications where they support real-world decisions…

arxiv.org

This is a series of posts about the new AACL 2022 scientific publication: UKP-SQuARE v2: Explainability and Adversarial Attacks for Trustworthy QA

Part 1: Explainability in Question Answering (this post)

Part 2: Adversarial Attacks in Question Answering

Explainability in Question Answering

A European approach to artificial intelligence

The way we approach Artificial Intelligence (AI) will define the world we live in the future. To help building a…

Explainable AI Cheat Sheet

Introducing the Explainable AI Cheat Sheet, your high-level guide to the set of tools and methods that helps humans…

Saliency Maps

Gradient-based Methods

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

This paper addresses the visualisation of image classification models, learnt using deep Convolutional Networks…

Axiomatic Attribution for Deep Networks

Mukund Sundararajan, Ankur Taly, Qiqi Yan Proceedings of the 34th International Conference on Machine Learning, PMLR…

SmoothGrad: removing noise by adding noise

Explaining the output of a deep network remains a challenge. In the case of an image classifier, one type of…

Attention-based Methods

Learning to Faithfully Rationalize by Construction

Sarthak Jain, Sarah Wiegreffe, Yuval Pinter, Byron C. Wallace. Proceedings of the 58th Annual Meeting of the…

Is Attention Interpretable?

Sofia Serrano, Noah A. Smith. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics…

Saliency Maps in UKP-SQuARE

Conclusions

UKP-SQuARE

Software for Question Answering Research

GitHub - UKP-SQuARE/square-core: SQuARE: Software for question answering research.

Flexible and Extensible Question Answering Platform SQuARE is a flexible and extensible Question Answering (QA)…

UKP-SQUARE: An Online Platform for Question Answering Research

Tim Baumgärtner, Kexin Wang, Rachneet Sachdeva, Gregor Geigle, Max Eichler, Clifton Poth, Hannah Sterz, Haritz Puerto…

UKP-SQuARE v2: Explainability and Adversarial Attacks for Trustworthy QA

Question Answering (QA) systems are increasingly deployed in applications where they support real-world decisions…

Written by UKP-SQuARE