10 things I’ve learned about explainable AI
IQT Labs hosted our first “Join the Conversation” event on explainable AI in Washington DC on September 18. I kicked off the event with a talk summarizing 10 things I’ve learned about explainable AI. This post is based on that talk.
Over the past year, I’ve worked with others to survey currently available tools and technologies that support “AI explainability” and “ML interpretability.” Some of that work is referenced here, and here. I’ve also interviewed 25 people whose work relates to this topic — technologists, start up company founders, investors, policy folks, and one philosophy professor.
Here are a few of the things I’ve learned:
1. There’s no consensus on what “explainability” means.
In 2019, not only do people disagree about the best way to do explainable AI, they disagree about how to characterize the problem. For some reason, this topic seems rife with misunderstanding — a lot of people talking past one another, sometimes without realizing they are doing so.
The misunderstanding is exacerbated by a lack of consistent terminology. For example — just in the interviews I’ve done so far — all of these terms have been used as synonyms for “explainable.”
But, for every person who uses two of these terms interchangeably, there is someone else who is adamant about how different they are.
2. AI is a black box and people want to look inside.
And different people want to look inside for different reasons.
There is a big difference between the types of explanations desired by people who produce machine learning models — data scientists, ML engineers, AI researchers, etc. — vs. the domain-experts, analysts, and decision-makers who consume the information those models produce.
Model producers want explanations that help them debug, tune, and tweak models. They want to build better models and to develop better intuition about how models work.
Model consumers, however, are much less interested in the inner workings of models. They want explanations that help them use data and results more effectively. They want more context around model output. They want to know how reliable that output is, how confident they can be about it, and how likely it is to change in the future. When a prediction is made, they want to know: Why was that the prediction? What were the contributing factors?
Today there are far more explainability tools designed for model producers. And these tools don’t satisfy the needs of model consumers. Most consumers aren’t experts in machine learning, they’re experts in the data, so they need explanations that are more accessible to a less technical audience.
3. A lot of very smart people think “explainable AI” is a red herring.
When you ask them why, the “red herring” folks typically cite one or more of the following three arguments:
(i) Neural Networks are just too complex to explain; transparency isn’t feasible.
(ii) Many people who want “explanations” either don’t have the technical expertise to understand them, or don’t want to accept that machines operate differently from humans. In either case, expecting a clear, intuitive, widely accessible, and true explanation from a machine is unrealistic.
(iii) Cognitive science research tells us that even human-generated explanations do not necessarily capture the “true” causes of why we do things — often our explanations are actually post-hoc rationalizations. And if humans can’t be trusted to provide — or even to know — the “true” explanations for our own actions, why should we hold machines to a higher bar than we hold ourselves?
4. Even if the red herring folks are right … AI is still a black box and people still want to look inside.
This means that explainability is an obstacle to adoption. And we’re going to have to do something about it, if we want these black box technologies to make their way out of the Lab and into the real world.
But the “red herring arguments” can help us re-frame the problem of explainable AI in more tractable ways. For example, instead of asking “how can we explain everything a neural network is doing?” And then deciding that’s not possible, we should ask “what sort of explanation will help people feel comfortable using model output?”
5. We can choose from multiple explanation strategies.
(a) Create a model of your model.
Say I work at a bank. I’m using a deep learning model to predict which individuals are likely to default on a home loan and I want to use those predictions to help me decide whose mortgages I’m going to approve. Because of the Fair Credit Reporting Act, if I deny someone a loan (in the U.S.) I have to give them a reason why. And just telling them that my model said so is a bit of a dodge, because the bank is still accountable for this decision, no matter what my model says.
One option is for me to use a simpler type of model — like a decision tree — to approximate my complex deep learning model. Then, I can inspect the logic of this simpler model. The benefit of this approach is that I can still use my highly-accurate-super-complex model. The downside, though, is that there’s a gap between how I’m actually making the decision and the logic I’m using to explain it. And this could potentially be misleading.
(b) Design for simplicity
Alternatively, I could just use the simpler model in the first place. But critics of this approach argue that while simple models are more interpretable, they are less accurate than complex models.
(c) Conduct an input-output analysis
A third strategy is to treat my model as a black box and focus on understanding the relationship between various inputs and outputs. This is similar to a sensitivity analysis. For example, how much do I have to increase someone’s credit score, in order for the model to predict that that person won’t default?
6. In some cases, there is a tradeoff between interpretability and predictive accuracy.
In most of the conversations I’ve had about explainable AI, it’s taken for granted that complex models are more accurate. And when given a choice between interpretability and accuracy, a surprising number of people come down on the side of complexity. The Principle of Parsimony, however, reminds us of a compelling counter-argument: that we should use the simplest thing that’s useful.
7. Maximizing predictive accuracy isn’t always the only goal.
If I’m using my model purely to make predictions, then I certainly care a lot about predictive accuracy. But, as my colleague Tommy Jones pointed out when I interviewed him: “We often build predictive models not because we care about the predictions themselves, but because we care about what’s driving the predictions.”
If I’m using a model to help me understand some phenomenon, then I want a model that provides me with a useful level of abstraction.
In his story “Del rigor en la ciencia,” Borges warns us about what can happen when we try to maximize the fidelity of a representation at the expense of everything else. That story is about a civilization where the science of cartography becomes so exact that people build a map that’s at the scale of the empire itself.
You know what happens? Future generations eventually realize that the full-scale map is pretty cumbersome and not really that useful for navigating.
8. The explanations we need depend on the application. And more specifically, the risk of harm.
Imagine an email spam filter that uses deep leaning. Here, the objective is clearly about prediction — we want the spam filter to predict — as accurately as possible — if an email is spam so that we can send it to the junk folder.
Someone deep in the bowels of some company undoubtedly cares about debugging this model, but for me as a consumer — as someone directly affected by the spam filter’s automated decision-making about my email — all I care about is less spam.
I’m willing to forego explainability altogether because the cost of failure is so low. If the model makes an error and sends a piece of spam to my inbox…well, I just throw it away. In the slightly worse case — where the spam filter misclassifies a real email as spam — it’s still not that big of a deal. I just have to remember to check my junk mailbox every so often.
Contrast this with the mortgage example, where I feel very differently about my right to an explanation. If the bank refuses to give me mortgage, I want to know why. Here, the stakes are a lot higher and so is the cost of an error.
9. Explanations should help us reason about errors and bad outcomes.
Let’s simplify the loan example to 4 possible outcomes:
One is I meet the bank’s criteria for a loan and they approve my mortgage. This is a good decision by the bank and a good outcome for me. In this case, I probably don’t ask for an explanation. I just take the money.
Two is I don’t meet the bank’s criteria and they deny my request for a loan. This may also be a good decision for the bank, even though it’s a bad outcome for me. Here, I probably do want an explanation, current financial regulations say the bank has to give me one, and understanding this explanation shouldn’t require a PhD in computer science.
Three is I don’t meet the bank’s criteria and they approve my mortgage anyway. Bad decision for the bank, but good outcome for me. I probably don’t care about an explanation, but someone at the bank should want one because the bank is taking on unnecessary risk. This is more of a debugging or model improvement use case; someone should want to understand how to prevent this sort of failure from happening again.
Four is the case we all worry about. I do meet the bank’s criteria, but they — wrongly — deny me a loan. I’m mad, I threaten to sue, and now everyone wants an explanation.
Ideally, we have explanations tailored for each of these situations. But at a minimum, explainable AI should help us tell the difference between them.
But of course, this is actually really hard.
10. For most people, probabilistic information is not intuitive.
It’s hard for a lot of technical reasons, but it’s also hard because models only give us probabilities. And the last few decades of behavioral economics and cognitive science research have shown us that for most people, probabilistic information is not intuitive.
For most people, it’s actually really tricky to reason about this type of information in the right way, but I think this is a worthy design goal for explainable AI:
How can we provide explanations that not only help people feel comfortable using emerging technologies, but that also help them reason about the output and make better decisions?