The Utility of Interpretable AI

Published in

Human-Machine Collaboration

9 min readDec 10, 2019

With the growing awareness about issues related to technological opacity, it is unsurprising that terms like “interpretability” and “transparency” are becoming increasingly ubiquitous in conversations about artificial intelligence. As algorithms and models become more and more prevalent in decision-making, particularly in scenarios where responsibility for these decisions has previously belonged to human actors, there is understandable and justified concern about the degree to which humans can understand the process by which a conclusion is reached. However, a significant proportion of research remains focused on creating explanations for opaque models, rather than on developing models which are inherently interpretable. With the resurgence in interest in artificial neural networks over the past decade, and their subsequent application to an ever broader array of scenarios, the necessity of understanding the limitations of models that are uninterpretable to their human designers and users is only becoming more clear.

Artificial neural networks are not a new concept — they have their origins in simple modeling of biological neurons performed using electrical circuits in the 1950s, and consequently are more representative of the then-current conception of how neurons work than of actual processes occurring in the brain — but advances in technological capability, as well as the increasing availability of very large datasets (and ever cheaper capacity to store and manipulate them), have only recently made their widespread use feasible. The achievements of neural networks in the past five years alone include beating Lee Sedol, one of the top-ranked Go players, at 4 out of 5 games, and substantial research has been conducted to develop new ways to train them, as well as determine an expanding array of problems to which they can be applied. However, despite the excitement (and, arguably, hype) that they have generated, neural networks have significant shortcomings. Not least of these is the fact that they are fundamentally opaque, “black-box” models.

In a 2019 paper, Manuel Carabantes notes the multiple forms that technological opacity can take. In some cases, it arises from intentional concealment of algorithms, even when those algorithms are not actually black-boxes. For example, the COMPAS recidivism prediction system is opaque because it is proprietary; Cynthia Rudin has noted that documentation for the COMPAS model suggests that it would be an interpretable predictive model if it were not considered a trade secret. There are clear issues with allowing the concealment of such models as COMPAS — in particular, Rudin notes that the quality of individual predictions obtained from these models is not necessarily the responsibility of the companies which profit from them — but their opacity is an issue of secrecy, rather than a definitive aspect of their architecture.

A second type of opacity results from technological illiteracy. Specifically, given that writing and understanding code are specialized skills, and that understanding models typically requires some degree of statistical familiarity, some technology is opaque to some audiences due to a lack of knowledge in specialized fields. This form of opacity is perhaps the most easily remedied, though as Carabantes points out, people are not likely to acquire information unless the expected benefits outweigh the expected costs. Given the potential costs — in terms of money, time, effort, etc. — of acquiring expertise in fields like computer science, it seems unlikely that any model can be expected to be fully transparent to all of society. However, this type of opacity is not behind the black-box nature of neural networks.

The final type of opacity Carabantes describes — which is most relevant to artificial neural networks — occurs because of cognitive mismatch. In the case of neural networks, the cognitive mismatch is the result of functions which are too complex for human comprehension (for example, due to their tendency to be highly recursive). Cognitive mismatch also occurs in the case of systems which are so large that no single engineer involved in their development can have a detailed understanding of the entire program. Both of these causes of cognitive mismatch are well-documented, with concerns about their impacts having been raised for several decades.

In the case of opacity from cognitive mismatch, explainability or interpretability may resolve some of these issues. Though the terms are sometimes used interchangeably, they describe two different approaches to resolving opacity. Explainability reduces opacity by creating explanations of what is happening in a model; this may mean showing the input or output data, the model itself, or the algorithm, and generally involves post-hoc analysis or methods used to understand the predictions derived from a trained model. Conversely, interpretability reduces opacity by using models which are inherently transparent or understandable. In a 2017 paper, Finale Doshi-Velez and Been Kim define interpretability as “the ability to explain or to present in understandable terms to a human.” For example, interpretability can be achieved for a recommendation model by showing the rules which are used (assuming that the model is constructed in a way that a human can understand).

While explainability and interpretability are often positioned as equally useful tools with which to combat opacity, explanations are not necessarily useful in the absence of interpretable models. Doshi-Velez and Kim argue that interpretability is not a requirement for machine learning systems, and suggest that explanation can be used as a substitute for interpretability. However, Rudin points out that explainable methods, by definition, cannot achieve perfect agreement with the original model — if they did have exact fidelity with the model, after all, the explanation and the original model would be identical, and there would be no need to construct a separate explanation. Consequently, low-fidelity explanations hinder trust in the explanation model (and in the original model which it purports to explain). But even an explanation which performs nearly identically to the black-box it accompanies may not actually correspond to what the black-box model does, since the explanation is simply an approximation of the performance of the original model.

In a 2019 paper, Patrick Hall goes a step further, and highlights the potential for explainable machine learning methods to be used to purposefully disguise misuse of black-box models. Explanation, Hall claims, is necessary, but not sufficient, to engender trust in a model. In her paper, Rudin demonstrates this with the saliency map below.

From Rudin (2019): Saliency maps for identifying this image as either a dog or a musical instrument; these maps fail to explain anything beyond where the network is looking.

While saliency or heat maps are frequently considered explanatory, they fail to provide any clarification regarding how the information deemed relevant is actually used within the model. Without knowing what a model is actually doing with the information it uses, a saliency map fails to provide an explanation for the outcomes, instead only explaining where the model is looking within an image. There are now multiple examples of image classification algorithms that aimed to make meaningful predictions, but instead focused on extraneous information in images, including certain copyright symbols (as opposed to the presence of a horse) and the type of x-ray equipment used (versus meaningful diagnostic information in the x-ray image itself).

On top of the issues that arise from offering explanations in lieu of interpretations, data preprocessing can further reduce transparency. Normalizing data or otherwise transforming it to obtain desirable properties for use with specific algorithms adds additional layers of opacity and furthers the distance between model outcomes and events in the actual world. The consequences of this can be especially impactful when models are used to make decisions about humans, or when attempts are made to predict human behavior — especially since data about humans tend to be noisy, and it is not always possible to accurately measure variables of interest.

When models are used in contexts where it is impossible to actually measure the things which people are actually interested in, proxy metrics are often necessary. Many of the things that humans are interested in predicting — many of the outcomes that are most important — are difficult, or even impossible to measure. In a 2019 post on the fast.ai blog, Rachel Thomas emphasizes that metrics are only stand-ins for the things people really care about, that they tend to overemphasize short-term concerns, and that they often fail to capture critical contextual information about the problem a model aims to address.

The failure of a proxy metric to adequately express information about which the model’s developers were truly interested was recently demonstrated by Ziad Obermeyer, Brian Powers, Christine Vogeli, and Sendhil Mullainathan’s paper examining a widely used healthcare algorithm. In their analysis, Obermeyer and his colleagues found that by using healthcare costs as a proxy for illness severity, the algorithm in question dramatically under-referred Black patients for additional care. Because healthcare expenditure in the United States is markedly different between racial groups, using an easily available metric (cost) in place of the variable of interest (health) created significant bias.

There are many cases where data are messy, noisy, and unstructured. With the availability of very large datasets, one argument in favor of the use of neural networks has been their ease of implementation, especially with the proliferation of libraries that make it easy to train and deploy a model and obtain accurate results. However, accuracy is of limited value when it comes from a process that is inherently opaque, particularly in high-stakes contexts. Rudin emphasizes that she has not yet found a high-stakes situation where a black-box model was actually necessary, and that the accuracy and performance gains touted by advocates of neural networks and other opaque models can be obtained with interpretable models, particularly by structuring data and ensuring representation of meaningful features, and by improving data processing for the next model iteration.

Given the risks they present — bias, unanticipated consequences, indecipherability — it seems reasonable to question whether black-box models are ever the appropriate tool. They may be marginally more accurate than interpretable models, but as discussed above, that additional accuracy is typically achievable through pre-processing and successive model creation. Neural networks may be useful for pattern recognition, in situations when there is no known algorithm; similarly, they might serve as a data processing tool for use in other algorithms (for example, for data from physical sensors in the environment). In such situations, the need for explanations and trust are lower, though not entirely absent, and depending on how the data are used, issues related to opacity may still arise.

There have also been proposals that use of black-box models be limited to situations of trivial decision-making. However, here the issue is that it can be difficult to determine what decisions are really trivial, as a decision that looks relatively inconsequential can ultimately have far-reaching consequences. For example, the choice of which YouTube video to recommend to a viewer might not seem like a high-stakes scenario. However, given that the time someone spends watching YouTube is used as a proxy for enjoyment or satisfaction with content, and the power of conspiracy theories and content that engenders distrust of mainstream media to keep people watching, it is clear that this decision has profound real-world impacts.

Ultimately, questions about black-box models and interpretability lead back to discussion about what the future of human-AI interaction should look like. Ideally, we should aim for scenarios humans and machines can each do what they are best at. For humans, this includes making judgments, assessing trade-offs, and responding to novel information. For machines, this includes processing data and automating routine tasks.

When working in collaboration with one another, humans and machines can accomplish many complex tasks more effectively than either can do alone. This includes both fun and creative pursuits — for example, cooperation between humans and computers has led to the creation of centaur chess, played by human-computer pairs — and tasks with the potential to dramatically include human lives, such as medical diagnosis. However, this is only true if the models used are not just explainable, but fully interpretable. While AI has its strengths, the human ability to contextualize information, improvise, and demonstrate flexibility is necessary not just to reduce the risk of harm from artificial intelligence; it is key to unlocking AI’s full potential.

About the Human-Machine Collaboration Publication and the Berkeley AI Meetup

Preparing and equipping humans to work and live with machines is far easier when creating those machines involves thoughtful consideration of human abilities and human needs. Given our interest in these issues, Bob Stark and Ian Moura decided to create a discussion group for the purpose of research and problem-solving through the Berkeley AI meetup group. This Medium publication summarizes the background information that we cover in our meetings.

References and Recommended Reading

Carabantes, M. (2019). Black-box artificial intelligence: an epistemological and critical analysis. https://sci-hub.tw/10.1007/s00146-019-00888-w

Cummings, M.L. (2004). Automation Bias in Intelligent Time Critical Decision Support Systems http://hal.pratt.duke.edu/sites/hal.pratt.duke.edu/files/u13/Automation%20Bias%20in%20Intelligent%20Time%20Critical%20Decision%20Support%20Systems.pdf

Doshi-Velez, F., & Been, K. (2017). Towards a Rigorous Science of Interpretable Machine Learning. https://arxiv.org/pdf/1702.08608.pdf

Hall, Patrick. (2019). Guidelines for Responsible and Human-Centered Use of Explainable Machine Learning. https://arxiv.org/pdf/1906.03533v1.pdf

Horvitz, E. (1999). Principles of Mixed-Initiative User Interfaces. http://courses.ischool.berkeley.edu/i296a-4/f99/papers/horvitz-chi99.pdf

Obermeyer, Z., Powers, B. Vogeli, C., & Mullainathan, S. (2019). Dissecting Racial Bias in an algorithm used to manage the health of populations. https://science.sciencemag.org/content/366/6464/447

Rudin, C. (2019) Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead. https://arxiv.org/pdf/1811.10154.pdf

Thomas, R. (2019). The problem with metrics is a big problem for AI https://www.fast.ai/2019/09/24/metrics/

The Utility of Interpretable AI

About the Human-Machine Collaboration Publication and the Berkeley AI Meetup

References and Recommended Reading

Written by Ian Moura