The Only Way to make Deep Learning Interpretable is to Have it Explain Itself

Published in

Intuition Machine

4 min readDec 7, 2016

One of the great biases that Machine Learning practitioners and Statisticians have is that our models and explanations of the world should be parsimonious. We’ve all bought into Occam’s Razor:

Among competing hypotheses, the one with the fewest assumptions should be selected.

However, does that mean that our machine learning model’s need to be sparse? Does that mean that true understanding can only come from closed form analytic solutions? Do our theories have to be elegant and simple?

Yann LeCun in a recent FaceBook post commenting about a thesis on “Deep Learning and Uncertainty” points out to a 1987 paper by his colleagues at Bell Labs titled “Large Automatic Learning, Rule Extraction, and Generalization”. This paper emphasizes the problem:

When a network is given more resources than the minimum needed to solve a given task , the symmetric, low-order, local solutions that humans seem to prefer are not the ones that the network chooses from the vast number of solutions available; indeed , the generalized delta method and similar learning procedures do not usually hold the “human “ solutions stable against perturbations.

One of the probable reasons why Deep Learning requires an inordinate amount of iterations and training data is because we seek Occan’s Razor, that sparse solution. What if however, the solution to unsupervised learning (aka Predictive Learning) is in embracing randomness?

Let’s table the proof of this for a later time, and assume its validity for argument’s sake. That is, randomness is the natural equilibrium state (is it not obvious?). What this implies is that the model parameters will be completely random and interpretability will be completely hopeless. Unless of course, we can ask the machine to explain itself!

I was about to end this post with the last paragraph, but I thought that some examples may help explore this idea much more thoroughly.

Stephen Merity (MetaMind) has a detailed examination of Google’ Neural Machine Translator (GNMT) that is worth a read. The interesting thing about GNMT is that Google headlines this as “Zero-Shot Translation”:

This zero-shot capability here refers to the capability of this machine to learn for example a Japanese to English translation even if it was never trained with this particular translation pair! To quote them:

This means the network must be encoding something about the semantics of the sentence rather than simply memorizing phrase-to-phrase translations. We interpret this as a sign of existence of an interlingua in the network.

Will we perhaps be able to decipher this new “interlingua” or “esperanto” that this machine created? Do we have a priori ideas as how this interligua is supposed to look like and perhaps performing a kind of regularization to make it more interpretable for humans? Will the act of insisting on interpretability lead to a less capable translator? Are Vulcan Mind-Melds necessary?

It just seems that we should leave the representation as it is and use the machine to perform the translation into English. In fact, that is already what it currently does. We don’t need some new kind of method to interpret the representation. The capability is already baked in there.

This is in fact what the folks at MIT, who have researched about “Making computers explain themselves”, have done:

They’ve trained their network to learn how to explain itself.

Update: Here are some slides from DARPA project XAI exploring explainability.

The Deep Learning AI Playbook: Strategy for Disruptive Artificial Intelligence

If you were able to grok this article, then feel free to join the conversation at this LinkedIn group: https://www.linkedin.com/groups/8584076

The Only Way to make Deep Learning Interpretable is to Have it Explain Itself

Written by Carlos E. Perez