Explaining Explainable AI

Towards trusting (X)AI

Published in

Delta Force

6 min readAug 16, 2021

Be it personal assistants, automobiles or, social media, AI has taken over almost every aspect of our everyday life. Effort continues to be made to use AI to solve fundamental everyday problems. When we ask, “Will AI take over the world?” are we being dramatic, or is that a possibility? Well, it is stated that AI will outperform humans in various tasks and reduce the number of jobs by a significant amount very soon. Adding to it, just the fact that the $127 billion autonomous vehicle market is currently being driven by AI, is enough to prove how AI does have the possibility of taking over the world. The question here is about how “ok” it is for this to happen and can it really go wrong? One day, when you wake up, it is fine that a video recommended for you on YouTube is not as per your interest, but imagine a self-driving car not going as per your interest!? Crazy, isn’t it?

AI is being used in essential and risky domains to carry out tasks that can have significant consequences if decisions made by AI go wrong, be it in medicine, the military, or the automobile industry itself. Basically, the question lies around how much do we trust the AI algorithms to let it dominate our lives in a major way. Talking about AI and ML models, they can be broadly classified as white-box models and black-box models. These divisions are made with respect to getting a human explanation of models (post hoc methods), whereas the other alternative is developing interpretable models.

White-Box models

These are weaker and simpler models like linear regression or decision trees that are not essentially able to handle the complexities of datasets. They are easier to understand in terms of how they make decisions and thus are easily interpretable. For example, decision tree is a supervised learning technique (for regression and classification) where the tree’s leaf nodes are the output that tells us what decision the algorithm has made, and these are made on the basis of internal nodes, which represent dataset features.

Black-Box models

These are more complicated and “dangerous” models, which surprisingly show good performance and high accuracy. These include neural networks, gradient boosting models, random forest, etc., whose decisions are difficult to comprehend or work out by hand. These models are again of two types: Deep neural networks and proprietary algorithms. Deep neural networks have multiple neurons in each layer that hold values computed from previous layers, combined with several parameters which are difficult to construct manually. Proprietary algorithms, on other hand, are more about keeping the algorithm hidden from the users, but the AI experts who created the algorithms, know how these work.

The Tradeoff between Accuracy and Explainability of models (source: kdnuggets)

Why Explainable AI?

So why did I term black box models as “dangerous”? Well, as these models are not understandable by humans per se, implies that any errors or biases in the model can not be pointed out and corrected by us, both users and sometimes even the AI experts. The anonymity of decisions can make these algorithms go really wrong without us noticing it. Now, this is okay in recommendation systems, or social media feed, but the real problem arises when these models are implemented to make judicial decisions, medical decisions, or even calculating the credit score. These deadly consequences can cost us a lot, and thus, we need transparency in these models. By transparency, we mean knowing not just the decision but “how” the model arrived at a decision.

If we want to take prescriptions from AI, we need to be able to trust it, which is why making AI explainable is important. It is also our social right to just know the how and why behind AI decisions, that will affect us in some way or the other. Explainable AI will help us confirm if the algorithm is working right, allow us to improve algorithms when they go wrong or are subject to bias and errors, justify the decisions, prevent overfitting and underfitting as well as prevent harmful consequences.

Interpretability and explainability of AI and ML models are fairly debated on. Interpretability refers to knowing the cause and effects in the system. It’s like knowing you are most likely to catch a cold if you eat ice cream in winters. Explainability, on the other hand, refers to the “how” of the outputs. It’s like knowing why and how we fall sick when we eat ice creams in winter. To clear this up, explainable AI needs models to be both interpretable and explainable.

Existing XAI Approaches:

Though explainable AI approaches have been classified with respect to the scope of explanation, data types as well as agnosticism, we will be looking into categorisation based on the type of explanation. Some of the traditional methods in an attempt to make AI more human-understandable are:

Visual Explanation (Visualisation of data and model’s working)
Exploratory data analysis (example: data clustering)
Model Evaluation metrics (example: F1 score, precision, and recall)

The above-stated methods are very generic and have limitations in interpreting the algorithms after deployment, due to changes in environment, noise, and data features, which leads to decreasing performance. Thus modern XAI model interpretation techniques and different libraries are now built to make AI more explainable. Some of them are listed below, which use techniques like feature importance, dependence plots, and surrogate model creation:

LIME (Local Interpretable Model-Agnostic Explanations)

This technique can be applied to any model (model-agnostic) to study how the input data gives the prediction. It tweaks one data sample (changes feature values ) and studies how it affects the output. LIME provides us with explanations as to why the decision was made in terms of its features and how each feature contributed to the result.

ELI5 (Explain Like I am 5)

A python package that provides explanations and helps debug ML classifiers. It lays out built functions that can be called to inspect model parameters as well as individual predictions.

SHAP (SHapley Additive exPlanations)

It is also a python package that offers a game-theoretic approach to provide explanations to the output. We use Shap values to read our model, which interprets how feature values affect the result if they hold a baseline value.

Tensorflow’s What-If tool (WIT)

WIT is used to analyse the model being deployed, to analyse black-box ML classification and regression models. This provides an interface to analyse model features, large datasets, and sub-datasets in visualized form, requiring no code.

Visualization of data by WIT- Tensorflow (Source)

Conclusion

All I can say at the end is, artificial intelligence needs to be blended well with human intelligence to produce better and safer results, ensuring that when Sophia said, “Ok I will destroy humans” when interviewed at the SXSW technology conference, she doesn’t actually end up doing it. Attempt needs to be made to make all AI models around us more explainable so that we are in the loop with the model as to how the decisions are made, and we trust the AI enough to accept it and let it affect our lives.