Introduction to Interpretable AI : Part 1

understanding how a model works

Mehul Gupta

Published in

Data Science in your pocket

4 min readSep 7, 2022

A general Data Science project includes the below steps:

Data collection
Preprocessing
Analysis & Visualization
Training some model XYZ
Deployment to production

Though, there is one more crucial step that is often missed in the pipeline which is interpreting & explaining the model.

Wait a minute !!

What does this mean? Also, isn’t Interpretability & Explainability the same thing?

Interpretability in AI: It refers to understanding why the model has given a specific output. For example: in the case of ‘Linear Regression’, to understand the output value, one can go & check the weights assigned to different features used in the linear equation to understand why we have got a specific output.

Explainability in AI: Going a level deeper, It tries to understand the internal working of the model. So in the case of Linear Regression, why a particular feature is assigned more weight than others can fall under Explainable AI.

These two definitions may sound very close but are quite different!

Now if you look closely at the definitions, Interpretability can be taken as a building block of Explainability where we are running some sort of analysis to gain insights into the model. Why are these insights observable? This is something to be covered under Explainability. Even in the above Linear Regression example, Interpretability is more interested in knowing which features were given more importance to get to the final solution. How/Why these features are important, it's none of Interpretability’s business but food for thought for Explainability.

This blog series is more towards Interpretability of AI models & not Explainability.

But why is Interpretability & Explainability necessary?

There can be multiple issues that may exist/come in the future even if you have a model performing well for the validation set

Concept drift: Assume that you trained a model where the dependent variable (say y) is linearly dependent on 3 features (say a,b,c). Now, as time passed, the relationship between variables changed & now ‘y’ is non-linearly dependent on ‘a’. As you can see, neither the input nor output changed but the relationship between target & feature set changed called concept drift.
Biased decision making: Talk of the town, you must have heard of some AI models recognizing black men as primates.

Note: If not, do read this

Such problems occur due to bias either in the data or the model, where the model starts making biased predictions towards a particular group.

Data leak: When some information in either training set, validation set, or test set is available (which won’t be available on deployment in prod) such that it leaks information about the target variable is called a data leak. A classic example is one where while data splitting for train & test, we consider a subset such that it is present in both training & validation. Hence, the model performs exceptionally well on validation data as it has already seen it while training.
Regulatory noncompliance: This is a new term even I came across while reading about Interpretable AI. This regulation contains an article stating an individual can challenge decisions made by an AI system that is using their personal data. This requires the developers & stakeholders to have an understanding of why the AI system is working in a given manner.

If you look closely, all these problems can be resolved/ traced by Interpreting & Explaining the AI system developed; understanding the “why” & “how” behind the decision made the AI

So, as we have a very clear motive to read about Interpretable AI, let’s briefly know about different types of Interpretability techniques

Intrinsic: As the name suggests, these are straightforward models where interpreting the results is quite easy. For example: In the case of Linear regression, to understand the result, just observe the weights of the different features. They are called White box models while their opposite is Black box models that are hard to interpret.

Model specific & agnostic: Model specific, as the name suggests, are techniques that can be used with a certain set of models & not all. On the other hand, Model agnostic techniques are universal techniques that can work with any model.

Local & Global: Local techniques work with a certain set of instances/examples for a model while Global techniques work for every sample/instance for the model hence giving a global understanding of the model rather than instance specific.

Keeping this short, Next, I will move on with a deeper discussion on White & Black box models' Interpretability

Before we wrap up, a big thanks to this great book for reference

Introduction to Interpretable AI : Part 1

understanding how a model works

But why is Interpretability & Explainability necessary?

Written by Mehul Gupta