Explainable vs Interpretable AI: An Intuitive Example

Adam Hare
The Startup
Published in
6 min readJul 13, 2020

Both explainable and interpretable AI are emerging topics in computer science. However, the difference between the two is not always obvious, even to academics. In this post I aim to provide some intuition using a simple example.

Photo by Markus Winkler on Unsplash

Introduction

As techniques in both Artificial Intelligence (AI) and Machine Learning (ML) have become more complicated and more opaque, there has been a call for algorithms that humans can understand. After all, how can we identify bias or correct mistakes if we don’t understand how these techniques are reaching decisions?

Two main fields have arisen in response to this: Explainable AI and Interpretable AI.¹ Explainable AI models “summarize the reasons […] for [their] behavior […] or produce insights about the causes of their decisions,” whereas Interpretable AI refers to AI systems which “describe the internals of a system in a way which is understandable to humans” (Gilpin et al. 2018).

Ok, so those definitions are all well and good, but what do they mean? How can I classify one technique as explainable or interpretable? This is a question that academics have disagreed on (see Section 2.1.1 in my Master’s Thesis for more details) and isn’t obvious to anyone just reading the definitions. I mean, summarizing reasons for behavior sounds a lot like describing the internals of the system right?

After a lot of thought and a lot of reading on this issue, I think I’ve found an example of a real-life application that provides some good intuition.

Teacher’s Feedback

Imagine yourself in college. After a long and sleepless reading period, you’ve submitted two final essays: one for American Literature 101 and one for Intro to Classical Studies. When it’s finally time to see your grades, you anxiously log on to see… two B+’s. Over-achiever that you are, you think there must be some mistake and go to both of your professors to ask for feedback.

Your American Literature professor hands you back your paper with a few marks and a short paragraph at the end outlining the strengths and weaknesses of your argument. While he agrees with your choice of The Great Gatsby as The Great American Novel, he felt your argument did not sufficiently address the symbolic use of color present throughout the novel. Based on these two comments, he has assigned you a B+ overall. You’re a little confused — how did he weigh all of these things together to get a B+? Where exactly did you lose points? How did he decided which parts of the paper were important? How close were you to an A-?

A little dissatisfied, you go to your Classics professor and ask her for some feedback. Instead of comments, she hands you a detailed rubric. You see you lost a few points for grammar and a few for missing citations. It also notes that you only referenced three primary sources when the project requirements mentioned five or more. With these deductions, you got a 90%, which according to your university’s policy means you got a B+. Again though, you’re a little dissatisfied. Why is each missing citation worth 2 points but each grammar mistake only worth a half point? Why was the number of sources worth 10% of your grade instead of 20%?

As you may have guessed, each of these approaches is meant to illustrate one of these AI techniques. The first professor, who provided written feedback, is doing something analogous to Explainable AI. You got an explanation of your grade that gave you some details about what went in to the decision-making progress. You can see your strengths and shortcomings quite clearly. However, you don’t really know how these things mapped to your exact grade. If you were given feedback like this for a classmate’s paper and asked to assign a grade, you wouldn’t really know where to start. You have some intuition for how the decision was made but couldn’t recreate it yourself. Worse yet, the professor could be biased or dishonest in his explanation. Maybe he thought the paper was only really a B paper, but bumped it up because of your class participation and choice of topic. Maybe he just assigned your grade at random and wrote feedback to justify it. You can’t really know for sure what happened.² Explainable AI systems in general have these same advantages and disadvantages — it’s hard to know how a result was arrived at, but you mostly know why (if you trust the explanation).

Ok, so this approach had some strengths and weaknesses. How about your Classics professor? Her approach is more akin to Interpretable AI. You saw from your results exactly how the grade was calculated. If you got someone else’s paper, you could follow this rubric and arrive at the exact same grade as the professor. If you notice an error, you could easily approach the professor and get points back. There are a couple of problems though. Imagine if the rubric had 1000 points on it — it would be too time-consuming for you to scrutinize every one to understand how you got your grade. You also don’t really know where that rubric came from. Did your professor base it on another course or results from previous years? Did she write the rubric subtly so that students who wrote on one topic did worse than students who wrote on another?³ Why were certain things included in the rubric and not others? Why was the required number of sources 5 and not 3 or 7? These explanations are not provided by the rubric. Explanations are precisely what strictly Interpretable AI lacks. It’s very easy to see how the algorithm arrived at its conclusion but not why each step of the decision process was created.

Conclusion

Hopefully with this example in mind, it is easier to draw lines between the two categories. Explainable AI tells you why it made the decision it did, but not how it arrived at that decision.⁴ Interpretable AI tells you how it made the decision, but not why the criteria it used is sensible.⁵ We can of course imagine systems that are both Explainable and Interpretable.⁶ In this case, a professor could provide a rubric along with written feedback and an explanation for why each part of the rubric is important.

Overall, this distinction still remains a bit fuzzy and it’s easy for two people to have different ways of classifying the same technique. That’s why it’s important to establish clear definitions at the beginning of any argument: otherwise you’re liable to spend a lot of time trying to correct misunderstandings.

Thanks for reading! I hope this has been helpful in disambiguating the two topics. For more in depth information on these topics, check out the two papers cited in the section below. You can also check out my Master’s thesis here, which goes a bit more into these topics as well as a number of applications.

Footnotes & Citations

  1. We can just as easily refer to Explainable ML and Interpretable ML. The ideas are the same; I’ve chosen to go with AI in this post to avoid confusion.
  2. In the Explainable/Interpretable AI field this is known as “fidelity.” Basically, an explanation has high fidelity if it is very faithful to how the model actually made its decision. Explanations with low fidelity may have little to nothing to do with how the decision was actually made.
  3. This sort of bias seems like it might be obvious but could actually be hard to detect. For this example, the teacher could provide more reference material for one topic and weight the rubric to heavily penalize students who used fewer citations.
  4. LIME is an example of this.
  5. A simple rules list is an example — the rules and thresholds appear arbitrary and illogical, not really offering an explanation.
  6. A rules list with prototypes as mentioned here might be both explainable and interpretable. The rules list would be traceable and therefore interpretable, whereas the prototypes would serve as explanations for each rule.

Gilpin, L. H., Bau, D., Yuan, B. Z., Bajwa, A., Specter, M., and Kagal, L. (2018). Explaining Explanations: An Overview of Interpretability of Machine Learning. In 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), pages 80–89. IEEE.

Rudin, C. (2018). Please Stop Explaining Black Box Models for High Stakes Decisions. arXiv preprint arXiv:1811.10154.

--

--