XAI: Explainable Artificial Intelligence

6 min readNov 9, 2023

When you search the phrase “XAI” you may think it is a company funded by Elon Musk, but we are not talking about that. We are talking about Explainability in AI/ML/DL models.
As I have searched in the papers I have written, there are lots of keywords meant to mention this keyword.

Keywords

There are lots of keywords that I will contain almost all of them in the rest of this article:

Explainable AI (XAI)
Feature Importance
Model Interpretability
Feature Attribution
Interpretation of Black-box Models
Model Explainability
Explainable AI Frameworks
Interpretable AI
Understandable AI
Trustworthy AI
Responsible AI
Black-Box vs. White-Box Methods
Reliable AI
Transparent AI
Meaningful AI
Knowledge limits AI
Accountable AI
Self-explaining AI
Self-attentive
Comprehensible AI
Human-readable AI
Promising AI
Mission-critical applications
Influence of inputs & important feature finding, PCA
Simulating the absence of a feature
Unveiling the reasoning process.

The above list is the result of my searches in the XAI papers.

What is it?

It tells us why this model decided to say this is the output. Why is it forecasting this, why not another thing/case?

Why Explainability?

Going back to 10–20 years ago, when almost everything was traditional AI, ranging from algorithmic (approximate or definite) methods, RL, Traditional Computer Vision, and non-automated or semi-automated processes. At that time, everything was explainable. Why do we do this? because sth! But in the recent decade, Deep Learning Methods are bold, and we give them sth, and they give us sth good! Why it is working? Why backpropagation works? Because the brain does sth similar to it, we are happy that we can gain good results by mimicking the procedure of the brain and neurons (which we call neural networks).

As you might understand, these deep learning methods are not as explainable as the previous methods (traditional ones), but they are off-courses stronger.

Trade-off

Source: Survey of explainable artificial intelligence techniques for biomedical imaging with Deep Neural Networks

There is a trade-off to use the newest DL models:

They are stronger
They are less interpretable

So there is no worry about using them as assistants or some non-important tools in different areas, but how about these 2 critical domains?

— Financial Cases (For example Stock Prediction) (if we don’t know why is this model predicting in this way, we cannot rely on it)

— Medical Issues (Drug Discovery, etc.)

In these cases, we should be very very careful about the reasons behind a decision, why does the model say that the patient has this illness, or why is the model revealing that the patient has a tumor?

We have 3 choices:

Don’t use AI/ML/DL models until they become 100% explainable and interpretable.
Use them because they are outperforming humans.
Use them in a limited way (Use them as Assistants, not the person in charge).

The best answer till now, is the 3rd one, to use them in critical areas as assistants and parallelly, work on their interpretability.

Interpretability Methods

There are more than 200 papers on XAI methods, and there are lots of surveys written in this field of study, but I am going to briefly (and mainly by picture), introduce some of these methods.

Firstly, note that almost all of the interpretability methods are based on Vision Interpretability.

Overview of current methods

Source: On Interpretability of Artificial Neural Networks: A Survey

Source: Transparency of Deep Neural Networks for Medical Image Analysis: A Review of Interpretability Methods

1. Attribution/Attention Maps

They highlight the important parts of the picture that resulted in this tag/label/output

Source: INTERPRETABLE MEDICAL IMAGERY DIAGNOSIS WITH S E L F — AT T E N T I V E T R A N S F O R M E R S: A R E V I E W O F EXPLAINABLE AI FOR HEALTH CARE

2. Explanatory Graph

Source: Visual Interpretability for Deep Learning: a Survey

As said in the reference [5]:

Explanatory graph [Zhang et al., 2018a]. An explanatory graph represents the knowledge hierarchy hidden in conv-layers of a CNN. Each filter in a pre-trained CNN may be activated by different object parts. [Zhang et al., 2018a] disentangles part patterns from each filter in an unsupervised manner, thereby clarifying the knowledge representation.

3. Heat Maps

As mentioned in the paper [5]:

A heat map visualizes the spatial distribution of the top-50% patterns in the L-th layer of the explanatory graph with the highest inference scores.

4. Saliency maps

Source: There and Back Again: Revisiting Backpropagation Saliency Methods

As reference [6] indicates:

Saliency methods seek to explain the predictions of a model by producing an importance map across each input sample. A popular class of such methods is based on backpropagating a signal and analyzing the resulting gradient.

5. Perturbing Important Learned Features

Source: Explaining Neural Networks via Perturbing Important Learned Features

Reference [7]:

Given an input, 1) prune unimportant neurons (neurons whose removal minimally affects the output of the target neuron) and 2) subsequently find input perturbation that maximally changes the output in the pruned network. The resulting perturbation serves as an explanation of input features that contribute the most to the network’s output.

6. Divide and Conquer

Source: PAMI: partition input and aggregate outputs for model interpretation

Reference [8]:

The basic idea is to mask the majority of the input and use the corresponding model output as the relative contribution of the preserved input part to the original model prediction.

7. Concept-based Methods

Source: HINT: Hierarchical Neuron Concept Explainer

Reference [9]:

To interpret deep networks, one main approach is to associate neurons with human-understandable concepts. However, existing methods often ignore the inherent relationships of different concepts (e.g., dog and cat both belong to animals), and thus lose the chance to explain neurons responsible for higher-level concepts (e.g., animal). In this paper, we study hierarchical concepts inspired by the hierarchical cognition process of human beings.

So What?

After all, there is a lack of interpretability methods in NLP as I see.

The evaluation methods and metrics have to be revised, too.

Discussion

I am open to talking about XAI and its future works. Have any questions or feedback, don’t hesitate to contact me at m.j.maheronnaghsh@gmail.com.

References

On Interpretability of Artificial Neural Networks: A Survey
Survey of explainable artificial intelligence techniques for biomedical imaging with Deep Neural Networks
Transparency of Deep Neural Networks for Medical Image Analysis: A Review of Interpretability Methods
INTERPRETABLE MEDICAL IMAGERY DIAGNOSIS WITH S E L F — AT T E N T I V E T R A N S F O R M E R S: A R E V I E W O F EXPLAINABLE AI FOR HEALTH CARE
Visual Interpretability for Deep Learning: a Survey
There and Back Again: Revisiting Backpropagation Saliency Methods
Explaining Neural Networks via Perturbing Important Learned Features
PAMI: partition input and aggregate outputs for model interpretation
HINT: Hierarchical Neuron Concept Explainer