Bleeding Edge Series: Explainable AI is the key to Social Acceptance of Artificial Intelligence

Published in

Deeper Insights

6 min readMay 28, 2019

Systems that rely upon artificial intelligence to make decisions are often black boxes that produce values which are interpreted to signify a certain meaning. There is often no explanation of how the system arrived at these values. This lack of explanation may not be an issue for problems such as predicting film revenue from reviews, but it will be an impediment to the adoption of such systems in highly regulated domains such as healthcare, finance, and the military.

Explanations of how decisions are arrived at, may reduce the accusations of bias, and therefore may reduce the possibility of legal action against private and government organisations.

There may be specific business reasons why organisations may wish to have explanations from their machine learning models. However, the academic community has deduced a number of general cases where explanations may be needed for inferences from complex machine learning methods.

The main general cases are: Explain to justify, Explain to control, Explain to improve and Explain to discover.

Adadi and Berrada state that Explain to justify is typically for decisions or inferences that need to comply with legislation or regulation. Whereas Explain to control is to determine what the weaknesses of the model are.

Explain to improve is a strategy where the explanations about the relationships between inputs and outputs can be used to improve the model, and finally, Explain to discover is a technique that uses explanations to discover new information.

This blog post will cover in a high-level manner why explainability is important to the 2nd wave adoption of Artificial Intelligence and some of the major techniques to derive explanations from black boxes.

Interpretability v Explainability

Interpretability and Explainability are often used as synonyms within the Artificial Intelligence (AI) and Machine Learning (ML) communities. But they are descriptions of slightly different phenomena. There are a number of definitions of interpretability with the mainstream definitions being “ Interpretability is the degree to which a human can understand the cause of a decision “, and “ Interpretability is the degree to which a human can consistently predict the model’s result”.

In short, a reasonably trained human will be able within a certain degree of error to predict the output of a model if a parameter or feature that was changed in the input. If this is the case, then the human will have an idea of how the system comes to a decision or an output value. Simple models, such as Naive Bayes classifier lend themselves to be interpreted by a human. A Naive Bayes classifier is based upon Bayes Rule and is simple enough to be set as a question on undergraduate exams, and therefore can be counted as an interpretable model.

The more complex models such as Deep Neural Networks are considered to be a black box where the model produced by the Neural Network is complex, and the aforementioned reasonably trained human is likely not to be able to predict the outcome of changing the input parameters.

Explainability, conversely, “ is the extent where the feature values of an instance are related to its model prediction in such a way that humans understand” or more simplistically, “ Why is this happening?”. Explanations may be as simple as highlighting important words in a sentence which determined its classification. An example of this phenomenon can be found here.

The linked example demonstrates the differences in the manner that two individual classifiers came to a decision. The Convolutional Neural Network (CNN) and Support Vector Machine (SVM) classify the text into the same categories, but the CNN uses less, but more relevant words. The SVM uses words such as “is”, that most humans would not associate with the space category that the first paragraph was assigned to.

This example demonstrates that a model such as a CNN may not be interpretable, but still may offer explanations of how the model comes to a classification. Explainability not only offers some confidence to decision makers when accepting outputs from AI dependent systems, but it also offers some indications of the robustness of the underlying AI. For example, Adversarial Attacks can fool image classification systems to interpret a stop sign as a speed limit sign.

Explainable AI techniques can help make AI systems more robust against this type of attack.

Interpretation Methods

The most complete survey paper in this area is the research conducted by Adadi and Berrada, and they stated that there are two types of interpretability: global and local. Global interpretability “ facilitates the understanding of the whole logic of a model and follows the entire reasoning leading to all the different possible outcomes” whereas “ local interpretability explains the reasons for a speciﬁc decision or single prediction means that interpretability is occurring locally”.

Adadi and Berrada contend that global interpretability is required when a clear picture of the whole reasoning process is needed by decision makers. They state that this level of interpretability would be required for population-level problems such as identify “drug consumption trends”. Their paper lists a number of techniques such as GIRP through which global interpretability can be achieved.

Local interpretability is required when there is a requirement for an interpretation of a single inference. Again Adadi and Berrada provide a comprehensive list of techniques that can be used to provide that single interpretation, the most well-known of which is Local Interpretable Model-Agnostic Explanation (LIME), which is made available as a Python library by its authors.

Explanation Methods

Machine learning techniques can have explainability as part of their design. Adadi and Berrada provide a list of learners that were designed to be explainable such as Bayesian Rule Lists and Sparse Linear Models.

Feature importance is one method of many that can be used. In the academic literature, there are a number of various techniques. Guidotti et al have produced a survey article of explainable methods for black box systems. For brevity, this post will cover Sensitivity Analysis and Partial Dependence Plots.

Sensitivity Analysis

Sensitivity analysis “ is the study of how the uncertainty in the output of a mathematical model or system (numerical or otherwise) can be divided and allocated to different sources of uncertainty in its input” and it can be used to “i ncrease understanding of the relationship between input and output variables in a system or model”.

This technique was used by Shu and Zhu. Their technique perturbed the input of the training data and measured the results of the output of the system. This approach allows the authors to estimate the sensitivity of the model to the input parameters to the outputs produced. From these results, it is therefore possible to estimate the relationship between inputs and outputs.

Partial Dependent Plot

This technique is a visualisation technique that “ shows the marginal effect one or two features have on the predicted outcome of a machine learning model”, and its aims are arguably the same as sensitivity analysis because they both try to infer the relationship between inputs and outputs, and from that infer a general model. Partial Dependence plots are available in major machine learning libraries such as scikit-learn. Examples are shown in the following diagram.

The plots estimate the dependency of the output variable: house price, with various features, such as House Age (HouseAge). The plots, in particular, median income, demonstrate the relationship between the input feature and the output variable.

The Future?

Interpretation and explanation roughly speaking try either directly or indirectly to achieve the same goal, which is to change machine learning from alchemy, where a magical black box produces inferences, which the uninitiated should trust unreservedly, to a system that provides reasons why it came to its decisions. This area is, without a doubt, the future of machine learning and artificial intelligent systems.

The dystopia of science fiction’s view of AI is unlikely to come true because systems can be held to account and the sudden homicidal tendencies of fictional future systems such as HAL should be avoided because human operators will be informed of its malicious intent.

In the real world, the advent of explainable AI is likely to speed the adoption of AI systems in mature and highly regulated industries where decision-makers can be assured that the decisions are made by a robust system that can offer explanations for its inferences. Poor or weak inferences can be rejected. Explainable AI will also ease the acceptance of AI making moral decisions such as end of life care and job termination.

Originally published at https://www.skimtechnologies.com on May 28, 2019.