Explainable Artificial Intelligence: Technical Perspective — Part 3

7 min readAug 14, 2020

This article is a part of a series. You can check out the series here — Part -1, Part -2 .

In the previous articles, we explored the context of Explainable AI, its challenges and Ante-hoc techniques, Post-hoc techniques. In this article, we shall expand our horizon further on Post-hoc techniques and emerging methodologies implemented for deep learning frameworks.

Shapley:

Lundberg and Lee proposed SHAP (SHapley Additive exPlanations), a unified framework for generating post-hoc local explanations in the form of additive feature attribution.

SHAP reveals the relative contribution of input features by using a mechanism of additive feature attributions. It employs reward-sharing among cooperative participants — a game theory approach — to incorporate inputs from multiple explanatory mechanisms.

The framework uses Shapley values to estimate the importance of each input feature for a given instance prediction. The marginal contributions are obtained by averaging over every possible sequence in which the players could have been added to the game.

Skater:

Skater is a unified framework, designed to demystify the learned structures of a black box model, often needed for real world use-cases.

An interpretable ML system using Skater enabling humans to optimise generalisation errors for better and more confident predictions. Skater allows model interpretation both globally and locally by leveraging and improving upon a combination of existing techniques.

For global explanations, Skater currently makes use of model-agnostic feature importance and partial dependence plots to comprehend the model.

DeepLIFT:

DeepLIFT (Deep Learning Important FeaTures) is a method that compares the activation of each neuron to its ‘reference activation’ and assigns contribution scores according to the difference.

It gives separate consideration to positive and negative contributions, it reveals dependencies which are missed by other approaches. Scores can be computed efficiently in a single backward pass.

DeepLIFT identifies combinatorial grammars of DNA words defining tissue-specific control elements.

It works through a form of back propagation — it takes the output, then attempts to pull it apart by ‘reading’ the various neurons that have gone into developing the original output.

Essentially, it’s a way of digging back into the feature selection inside of the algorithm.

Rulex Explainable AI:

Rulex is a company that creates predictive models in the form of first-order conditional logic rules that can be immediately understood and used by everybody.

Rulex’s core machine learning algorithm, the Logic Learning Machine (LLM), works in an entirely different way from conventional AI. The product is designed so that it produces conditional logic rules that predict the best decision choice, in plain language that is immediately clear to process professionals.

LLM has been employed in different fields, including orthopaedic patient classification, DNA microarray analysis and Clinical Decision Support System.

The three steps of Logic Learning Machine

Rulex-LLMs’ performances were compared to those of other supervised methods, namely Decision Trees (DT), Artificial Neural Networks (ANN), Logistic Regression (LR) and K-Nearest Neighbor (KNN). These tests showed that Rulex-LLMs results are better than those of ANN, DT (that produce rules) and KNN, and are comparable with those of LR.

Layer-wise Relevance Propagation:

This approach is based on the principles of redistribution and conservation. When we have an image and probability distribution of classes, we redistribute these to the input pixels, layer by layer.

We can decide the relevance of inputs and features by going backwards using Deep CNN to extract relevant features before identifying similarity between the images in feature space. We try to infer pixel-level details of the images that may have significantly informed the model’s choice.

BETA:

BETA (Black Box Explanations through Transparent Approximations), a novel model agnostic framework, closely connected to Interpretable Decision Sets (if-then rules).

BETA learns a compact two-level decision set in which each rule explains part of the model behaviour unambiguously.

It uses an objective function so that the learning process is optimised for high fidelity (high agreement between explanation and the model), low unambiguity (little overlaps between decision rules in the explanation), and high interpretability (the explanation decision set is lightweight and small). These aspects are combined into one objection function (NP-Hard problem) to optimise for.

Integrated Gradients:

Integrated Gradients is a variation on computing the gradient of the prediction output w.r.t. features of the input. It requires no modification to the original network, is simple to implement, and is applicable to a variety of deep models (sparse and dense, text and vision).

Integrated Gradients, a Aumann-Shapley method from cooperative game theory, is the unique path-integral method satisfying: Sensitivity, Insensitivity, Linearity preservation, Implementation invariance, Completeness, and Symmetry.

Some outstanding attributes include –

Easy to implement — Gradient calls on a batch, no instrumentation of the network, no new training.
Widely applicable — Used by 20+ product teams, and 3 ML frameworks at Google.
Backed by an axiomatic guarantee

Activation Atlases:

Google in collaboration with OpenAI, came up with Activation Atlases — a novel technique aimed at visualising how neural networks interact with each other and how they mature with information along with the depth of layers.

This approach was developed to have a look at the inner workings of convolutional vision networks and derive human-interpretable overview of concepts within the hidden layers of a network.

What-if Tool:

Google’s TensorFlow team announced the What-If Tool, an interactive visual interface tool released by Google under the PAIR (People + AI Research) initiative and designed to help visualise datasets and better understand the output of TensorFlow models. The tool can be accessed through TensorBoard or as an extension in a Jupyter or Colab notebook.

Once a model has been deployed, its performance can be viewed on a dataset in the What-If tool.

Additionally, one can slice the dataset by features and compare performance across those slices, identifying subsets of data on which the model performs best or worst, which can be very helpful for ML fairness investigations.

AIX 360:

The AI Explainability 360 toolkit is an open-source python package developed by IBM for interpretability and explainability of machine learning models. The AI Explainability 360 includes a comprehensive set of algorithms that cover different dimensions of explanations along with proxy explainability metrics.

The framework is developed with algorithms for case-based reasoning, directly interpretable rules, post hoc local explanations, post hoc global explanations, and more.

SOCRAT Approach:

Structured-Output Causal Rationalizer interprets the predictions of any black box structured input-structured output model around a specific input-output pair. This framework focuses on the general approach towards sequence-to sequence problems, adopting a variational auto-encoder to yield meaningful input perturbations.

Explaining complex predictions of a sequence-to-sequence model, e.g. language translator, using a bi-partite graph as explanation family and the SOCRAT approach:

First a perturbation model is used to obtain perturbed versions of the input sequence.
Next, associations between input and predicted sequence are inferred using a causal inference model.
Finally, the obtained associations are partitioned and the most relevant sets are selected.

Other Approaches:

New methods in academia for explaining deep learning models -

Testing with Concept Activation Vectors (TCAVs)
Activation Maximization (AM)
Intelligible Additive Models

Conclusion:

Explanation is an omnipresent factor in human reasoning and model interpretability continues to be an interesting challenge in machine learning. The biggest problem is that there isn’t a quantitative way to measure one explanation as being better than another.

The final goal of Explainable AI (XAI) is to have a toolkit library of ML and HCI (Human-Computer Interaction ) modules for more understandable AI implementations. In case of a gray area in decision output, XAI aims to give the analyst reasons for red flags upon which the analyst can act to make a decision using human input.

References:

1) Rulex Analytics White Paper — “Rulex’s Logic Learning Machines successfully meet biomedical challenges.”

2) “Explaining nonlinear classification decisions with deep Taylor decomposition”, ScienceDirect Paper

3) Himabindu Lakkaraju, Ece Kamar, Rich Caruana, Jure Leskovec, “Interpretable & Explorable Approximations of Black Box Models” — BETA

4) Mukund Sundararajan, Ankur Taly, Qiqi Yan, Proceedings of International Conference on Machine Learning (ICML), 2017 — Axiomatic Attribution for Deep Networks — Integrated Gradients

5) Shrikumar, Avanti, Peyton Greenside, and Anshul Kundaje. “Learning Important Features Through Propagating Activation Differences.” Proceedings of the 34th International Conference on Machine Learning, Volume 70. JMLR.org, 2017.

6) Chris Olah & Ludwig Schubert, “Exploring Neural Networks with Activation Atlases” — OpenAI

7) David Alvarez-Melis and Tommi S. Jaakkola, MIT — “A causal framework for explaining the predictions of black-box sequence-to-sequence models”

8) Anh Nguyen, Alexey Dosovitskiy, Jason Yosinski, Thomas Brox, Jeff Clune — “Synthesizing the preferred inputs for neurons in neural networks via deep generator networks”

9) Google Solutions — Google AI Blog

10) “Using the ‘What-if Tool’ to investigate machine learning models” — Kdnuggets

11) “Explainable AI ” — KDnuggets

12) “Introducing AI Explainability 360” — IBM Research Blog