Dark AI and the Promise of Explainability (Part I)

Sheldon Fernandez

Published in

DarwinAI

7 min readMar 3, 2020

AI’s ‘black box’ may be the greatest business and societal risk of our time. Here’s how we bring it to light.

by Sheldon Fernandez

Perhaps the easiest way to appreciate the black box problem in AI is through a provocative thought experiment. Consider a major scientific advancement and remove the foundational knowledge that enabled it:

In Boston in 1876, Alexander Graham Bell queries his assistant, Thomas Watson, using an enigmatic contraption on his desk. The machine appears to transmit sound, but Bell isn’t quite sure how it works.
At Kitty Hawk in 1904, the Wright brothers chance upon on a strange and unwieldy metallic structure. Its design and aerodynamic properties are a mystery, but through much trial-and-error, they’re flying…
In 1947 in Murray Hill, New Jersey, William Shockley, John Bardeen and Walter Brattain, discover a material with fascinating electrical properties. The physics of the device escape them, but the transistor — which they’ll later christen it — quickly becomes the foundation of modern computer systems.

In such cases, the euphoria and wonder of the moment would likely be offset by a nagging question: could we leverage something with only a precarious understanding of how it works, where the phenomenon is incomprehensible and the science is opaque? What are the implications when a step-changing technology reaches critical mass but its underpinnings are a mystery?

Such is the current state of Artificial Intelligence.

In the last few years, we’ve entered the era of deep learning, a powerful and intimidating set of technologies that is transforming organizations and industries alike. Automotive, aerospace, and healthcare are but some of the verticals touched by its capabilities, and lines of business have begun outsourcing important decisions to these mysterious but intelligent systems.

The ‘black box’ problem that plagues AI — our inability to peek inside exotic neural networks and understand how they work — represents one of the most urgent moral and business imperatives of our time. In these two posts, we’ll provide a brief history of ‘explainability’, examine leading approaches to the problem, and introduce our unique method of illuminating AI’s black box to enable the design of AI solutions you can trust.

In the spirit of the Wright Brothers, buckle up and enjoy the flight…

Those intimately familiar with present explainability techniques and their existing limitations may consider jumping to Part II of this post.

A Brief History of Explainability

The question of how deep neural networks — complex constructions that mimic the cognitive capabilities of the human brain — make their decisions has plagued both researchers and enterprises. While neural networks behave in ways that reflect the data they’re trained against and the human labelers who annotate said data (incorporating selection and human biases along the way), it remains unclear how they reach particular conclusions.

The consequences are far from trivial.

Explainability is a Business Imperative

“Explainability” — illustrating how the black box in AI works — has important implications for the reliability, efficacy, and ethics behind deep learning. If your organization leverages deep learning, these implications extend to these facets of your business, including the ability to design AI solutions in a rapid, robust, and ethical manner. Moreover, businesses subject to regulatory and compliance guidelines may be unable to apply deep learning solutions until the explainability nut is cracked.

Explainability, then, has the capacity to both unlock and amplify the potential of deep learning. By understanding how AI models work, we can design AI solutions to satisfy key performance indicators, correct errors, and mitigate bias. In the main, explainability solutions should:

Remove subjectivity, to minimize the role of human interpretation and intuition
Apply to industry and commercial contexts, rather than strict academic settings
Be direct, global, stable, and verifiable, to maximize utility and the scope of the explanations provided

Unfortunately, the assessment of explainability methods within the nascent deep learning field has been limited to date, with most evaluations focusing on subjective visual interpretations.

Existing Approaches to Explainability

At present, there are four major mechanisms that attempt to understand the inner workings of a deep neural network (long explanations are available in this excellent resource):

Perturbation: Creates perturbations of desired explanation drivers, analyzes their impact on a particular target, and assigns an importance score to the driver under examination
Backwards Propagation: Generates importance scores in terms of input features by going backwards through the model, layer by layer, to estimate the contribution of neurons in each previous layer
Proxy: Attempts to imitate the performance of a complex model like a deep neural network with a simpler, more explainable model, such as a decision tree or set of decision rules
Activation Optimization: Searches for input patterns which maximize (or minimize) the response for a target inner component of the model being examined

Within the public domain, there are four techniques from the research community that leverage the mechanisms above:

Local Interpretable Model-Agnostic Explanations (LIME): Uses local perturbations to provide explanations (importance scores) for an instance prediction of a model in terms of input features
SHapley Additive exPlanations (SHAP): Uses Shapley values (a classic approach from cooperative game theory) to estimate the importance of each input feature for a given instance prediction
Integrated Gradients: Examines gradients of target neuron activation at different layers to estimate contribution to each subsequent layer
Expected Gradients: An extension of Integrated Gradients in which the baseline input is not application-dependent

These are the primary techniques in use today, and while they’ve advanced the field of explainability considerably, it is important to understand their drawbacks.

The techniques above exhibit limitations owing to their academic origins. Primarily, they were designed, tested, and evaluated in controlled scenarios to investigate and explore potential solutions to the problem.

Numerous organizations have released toolboxes based on these methods, extending and attempting to apply them in commercial contexts with sharply different use-cases. Before employing these platforms and introducing them into your business, it’s important to understand the challenges when they are moved to enterprise environments, since in many cases they’re based on dated techniques that have been taken off the shelf, dusted, and exaggerated by tech startups.

Limitation #1: Some are indirect approaches

Some of these methods, such as LIME, are proxy methods that take an indirect approach to explainability. Proxies are popular as they’re easier to understand and manipulate when compared to the complicated models they represent. Specifically, they treat the model like a black box and probe at it, studying input and output combinations in order to build a simpler, more interpretable proxy model to extrapolate plausible explanations.

While such approaches are valid when there’s no way to interrogate a system’s inner workings — they fall short when the goal is to truly and objectively understand the internal mechanisms at play. Moreover, one cannot be sure that the decision-making process of the proxy model reflects that of its authentic and more complex counterpart.

Limitation #2: Some lack stability

Some of these approaches (e.g., LIME, SHAP) formulate their conclusions by examining only a local part of the network’s decision region and then build a linear, interpretable proxy model to approximate the behavior in that region.

Such local explanations are useful for describing how a model classifies a single instance but fall short of explaining global behavior — especially in complex, non-linear systems. Consequently, the explanations they produce can be unstable: querying for an explanation based on similar inputs often produces different, and sometimes contrasting, answers.

Global explanations, in contrast, attempt to explain predictions for entire classes of inputs, which is much more conducive to the needs of commercial users.

Limitation #3: Some lack verifiability

Due to the subjective nature of these approaches, it can be challenging to directly act upon the explanations they provide. What’s more, it can be difficult to quantify how accurate these approaches are — a necessity in reducing the subjectivity that results from human interpretation.

Limitation #4: They are open to interpretation

The outputs of some methods are often not quantitative in nature.

For example both Integrated Gradients and Expected Gradients produce a large number of heatmaps and leave it to the user to derive their own conclusions about the factors behind a decision, introducing considerable subjectivity into the resulting explanation.

Similarly, SHAP produces heat maps representing positive and negative factors, similarly demanding user interpretation since it’s difficult to identify the precise contributions of each factor on the resulting decision.

It is important to remember that neural networks are trained to minimize prediction errors on training data rather than aligning to human concepts and intuitions. Consequently, there’s no guarantee that a model’s decisions will be made in a manner that is conducive to human interpretation. Moreover, human experience varies from person to person, and this experience provides the context in which heatmaps and other explanation outputs are interpreted by individuals. Given this variability, two users might examine the same heatmap and come to vastly different conclusions about what it’s communicating.

Benchmarking Explainability by Quantifying Performance

To their credit, academics rarely make far-reaching claims about their techniques; the majority are fully aware of and forthcoming about the inherent limitations of their approaches and the ways in which the research might be improved. In fact, their papers often devote significant space to detailing the precise conditions under which their methods are applicable and ways they can be extended.

However, in the rush to gain commercial advantage, many vendors are co-opting these methods in light of internal and competitive pressures. The result is that academic techniques are being integrated into enterprise technology stacks despite known and often significant limitations.

The dangers with these risky extrapolations have yet to be fully appreciated, as there’s been scarce quantitative analysis of the utility and limitations of existing explainability techniques.

Until now…

[Continue to Part II]