Confusion Matrices & Interpretable ML

andrea b
high stakes design
Published in
8 min readNov 13, 2019


A conversation with neuroscientist-turned-data-scientist Nina Lopatina about understanding machine learning (ML) and why the brain might actually be more interpretable that some models.

(This conversation has been edited for clarity.)


ANDREA: How did you get started in machine learning?

NINA: I got into the field about ten years ago. I was working in Yael Niv’s research lab in Princeton before grad school. I did a bit of machine learning — mostly reinforcement learning — at that time. Then in grad school I got more into it. Once I had data, I went to a month-long workshop where I learned how to apply machine learning to my neural data and then I expanded that work on my own.

ANDREA: And what are you doing now?

NINA: I’m a data scientist in Lab 41.

ANDREA: The main thing I want to talk about today is explainable AI — what is this topic all about?

NINA: I started following this pretty closely at NeurIPS in December [2018]. My interpretation is that [the field] is fairly new and not well-defined. There’s a lot of discourse about what to call it — interpretability or explainability — but the overall goal is for machine learning not to be a black box.

ANDREA: What do you mean by “black box”?

NINA: I mean that you put an input into a model and then you get an output, [but] you don’t understand what happens in between. [With ML] there are so many layers, and you can’t process the volume of computation in an interpretable way, so there’s no easy way to look at [a model] and figure out what it’s doing.

I just remembered the first time that I thought about this — it was toward the end of my graduate program in neuroscience. I came across a paper by David Sussillo that merged neuroscience and interpretability by modeling part of the brain as an R.N.N. — a recurrent neural network — and then using the neural activity to understand the machine learning model. That was my first exposure to interpretability research, and it made a lot of sense in the context of neuroscience. A sizable body of research is aimed at understanding the black box of the brain, so it was intuitive to apply a similar approach to an RNN.

As much as we think of the brain as a black box —in some ways, it’s actually a bit more interpretable than machine learning.

ANDREA: Did you just say the brain is more interpretable than machine learning? Could you explain why that is?

NINA: We don’t really understand how brains work. But at the same time, we have all these methods to probe the brain and record it in different ways. We can knock parts of the brain out to see what happens.

ANDREA: Are you saying we have more tools to probe the brain than to probe machine learning models?

NINA: I think so, yes. We have more tools and data and centuries of research to understand how the brain makes its computations, but it is still debatable whether these approaches enable us to truly understand such a complex system.

ANDREA: That’s fascinating. Well, let’s dig into some of the tools we do have for interpreting machine learning. What are some approaches you find useful?

NINA: One method is to make a confusion matrix, for example, with small images of what got classified as what. Usually, you have your true label on one axis and the algorithm’s label on the other axis. One very useful new tool, the CIFAR confusion matrix, shows you the images instead of just the number of correct or incorrect predictions by class.

ANDREA: Let me make sure I understand how this works. You have a bunch of pictures of cats and dogs. Your model identifies most of the cat pictures as cats, but it also gets some wrong — sometimes it misidentifies a dog picture as a cat and other times, it says a cat picture is of a dog.

These are different kinds of errors, but with this tool — the confusion matrix — you can see the difference. Is that right?

NINA: Yes. So, sometimes you might have the wrong label. Your cat might be labeled “dog” because the label that went along with it [in the training data] was dog, by accident. Or you might see that all of your cats that are black, get labeled as dogs whereas the other ones get labeled correctly.

This approach gives you some insight into why your model makes certain decisions. But it requires time on the human side to make that inference. I might scan the image and realize “oh, there must be some glitch in the [model] code.” Or, I might see some pattern [in the results], and then do some feature engineering to make that less prominent.

ANDREA: That’s interesting — it’s a way that visually inspecting your results can help you see flaws in your model. Image classification is such a nice case for visual inspection because you’re dealing with images. Could you use this approach for other types of data?

NINA: You can use a confusion matrix for any type of data that gets classified. For example, with text, you would [show] all the words that were misclassified.

Some other tools that are gaining more prominence in NLP are LIME and this Seq-to-Seq viz tool — [both are based on an approach] of just changing one word and seeing what changes. Another interpretability approach is feature ranking — once a model is trained, you can look at how much each feature contributes [to the output].

ANDREA: In your work, what would you do with the information you get from these tools?

NINA: As a machine learning practitioner, one of my goals is improving model performance. So, I would basically use any [interpretability] tool to make sure that my model is working correctly.

ANDREA: I’ve heard you distinguish between “explainability” and “interpretability.” Could you explain how you think about the difference?

NINA: With explainability, you have a black box and you try to explain what it did; with interpretability you are actually doing something within the black box that lets you understand how it functions. Interpretability requires some sort of manipulation of the actual model to test if your explanation is valid or not. But with explainability you don’t [have to] do that. For example, a lot of popular methods involve just changing something in your input and seeing how that changes your output.

For more about “explainability” vs. “interpretability,” see this recent post.

ANDREA: To me, interpretability sounds like debugging. Is that a fair way to think about it?

NINA: Ah, no, it’s very different. Debugging is about finding flaws in code. In this case, the code works well, but something about the model was suboptimal.

ANDREA: Can you give me an analogy? If interpretability isn’t like debugging a computer program, what is it like?

NINA: It’s more like adding an extra component [to your model] that tells you about what the model did. The best [analogy] I can think of is an indicator light in your car — [and the] machine that you plug in to tell you more about the readout.

ANDREA: Do you see interpretability, primarily, as something that benefits machine learning researchers and tool builders?

NINA: Currently, yes. [Most of] the people making these tools are machine learning people, so they’re kind of tailoring the tools for themselves.

But also, it’s not clear that end users actually benefit from, or want, interpretability. I’m thinking of a presentation I saw at [ML Conf] last Fall. The author, Forough Poursabzi-Sangdeh, ran a study where she compared different conditions — with more or less complex models. She had scenarios where a machine learning model would predict the same output in two conditions, but in one [case] it would tell the user something about the basis for that prediction. And then sometimes it would make incorrect predictions.

The users who saw an explanation for the output were worse at correcting the model.

ANDREA: Wait — they were worse? That’s the opposite of what I would expect.

NINA: Yeah. They were given a chance to correct the machine learning algorithm. And participants were less able to correct inaccurate predictions of a clear model than a black box model. Also, there was no difference in [reported] trust between the black box and transparent [models].

Basically, the hypothesis was that interpretability would help but the results were the opposite of what was expected, which is also interesting.

ANDREA: I suppose this whole field is working off of an unproven assumption — that interpretability and explainability are helpful for people using machine learning systems.

NINA: That needs to be studied better, and the results may not be what we expect. But for people practicing machine learning, we know interpretability tools are helpful.

ANDREA: You’ve mentioned previously that it would be difficult to provide a sufficient explanation to someone who didn’t have a baseline understanding of machine learning. How much do you think someone needs to know in order to make use of the information you get from today’s interpretability tools?

NINA: People need to understand the outline of a system. For example, with object classification, how [something] goes from input to output with a very short amount of technical explanation of the different layers in between. I think a one-hour intro to machine learning would get someone eighty percent of the way there.


This post is part of a series of interviews that IQT Labs is conducting with technologists and thought leaders about Explainable AI. The original interview with Nina took place on March 18, 2019; this Q&A contains excerpts that were edited for clarity and approved by Nina.

Image credits:
Brain vector drawing from
Illustrations by Andrea



andrea b
high stakes design

Andrea is a designer, technologist & recovering architect, who is interested in how we interact with machines. For more info, check out: