Deep Neural Inspection with DeepBase

By Thibault Sellam and Eugene Wu

Deep learning is increasingly employed in applications that can affect the lives and safety of millions of people. Yet there is increasing societal weariness and wariness about when and why it works. Despite the currently excellent software and hardware infrastructure for nearly every part of the neural network (NN) development lifecycle — creating models, training them, evaluating their accuracy, and deploying them — we lack infrastructure for understanding how they make predictions.

For instance, it can take a young researcher days or weeks to simply extract activations from a newly trained model and plot them as a heat map. Performing more complex analyses such as identifying patterns in those activations can take even longer. This makes it harder to debug models, understand their failure cases, and catch harmful biases.

We currently lack infrastructure for understanding how Neural Networks make predictions.

In short, there is opportunity to develop a declarative system for developers to easily inspect and understand neural network models at scale. We call the class of analyses that inspect the behaviors of hidden units in NN models Deep Neural Inspection.

Further, we have implemented this vision in a declarative system called DeepBase, and used it to inspect numerous neural network models, such as object recognition models, language translation models, and in this blog post, a model trained using reinforcement learning to play PacMan.

We recently presented “Deep Neural Inspection with DeepBase” at the 2018 NeurIPS Systems for ML Workshop. This was the most recent evolution of our earlier 2018 SysML vision to bring software engineering principles to NN development, and we are continuing to make Deep Neural Inspection simple, fast, and scalable.

This blog post outlines what Deep Neural Inspection is, and walks through how to use our declarative API to inspect a PacMan agent. Follow-up blogposts will dive into the system design described in our recently accepted SIGMOD 2019 paper.

Part 1: What is Deep Neural Inspection?

There is currently a wave of research to understand how models internally behave. To do so, researchers extract activation “behaviors” for individual or groups of hidden units in a NN model, and develop measures to map these “behaviors” to higher order concepts such as “detected chair” [23] or “learned parsing rules” [19]. For instance, the input record may be a sentence or image, and a behavior is simply some value (e.g., raw activation, or the gradient) that can be derived from a hidden unit for each symbol in the record (e.g., pixel, character, word). These behaviors are typically visualized as below:

(Left) A RNN unit’s activation intensity overlaid on top of code and sentences. (Right) Hidden unit behaviors for the same image computed using SmoothGrad, Guided BackProp, Guided GradCAM, respectively.

The above images are rendered for an hidden unit on an individual sentence or image, however real datasets contain thousands or millions of image records. It’s not realistic for humans to visually inspect the visualization for every hidden unit and every test record.

What if we want to know whether units in a facial recognition model recognize eyes or ears as part of making a prediction? Or whether a language translation model detects syntactical structure? For those cases, it is necessary to automate the analysis. Intuitively, if a hidden unit only activates for pixels in the eye, or noun phrases, then there is evidence for the hypothesis that the hidden unit has learned to detect those specific features.

This intuition has been explored in dozens of existing NN research projects, across different domains, for different models, and to answer different types of hypotheses. For instance:

  • [Zhou et al., ICLR15] measures whether specialized hidden units (e.g., the “dog neuron”) exist in CNNs. To do so, they annotate pixels containing a dog with 1, and measure the annotations’ Jaccard similarity with highly activated pixels (called the IoU score).
  • [Belinkov et al., ACL17] verifies whether layers in neural translation models learn part-of-speech tags as a byproduct of translation. To do so, they use the activations within each layer to predict e.g., whether the input word is e.g., within a noun phrase.
  • [Shi et al. EMNLP 2016] checks whether character-level language models learn grammar by predicting the occurrence of grammar rules from the activations of the model.

Each of these analyses computes a statistical “affinity” measure between the behaviors of hidden units in a trained NN model and annotations of the input data (e.g., annotating noun words, or eye pixels). This pattern is so frequent that we previously termed this type of analysis Deep Neural Inspection (DNI).

Current Deep Neural Inspection practices are akin to data management before relational databases.

It currently can take weeks or months to perform a single DNI analysis, because it requires an expert to manually write prodigious amounts of code. This code needs to extract hidden unit behaviors, such as the activation or gradients or other types of behaviors, generate the annotations that encode the higher-level features, and compute affinity measures between every combination of hidden unit and feature. There are tremendous data management challenges due to the massive sizes of modern NN models and datasets. This results in one-off DNI analyses that hard-codes a specific model or use-case, and re-implements data management, extraction, optimization decisions.

For instance, we conducted a brief survey of existing DNI analysis implementations. Of the many DNI papers, we only found 7 public implementations (including two versions of the amazing NetDissect). Although an imperfect measure of complexity, every analysis below requires hundreds or thousands of lines of code!

To reiterate, the current state of DNI is akin to the pre-1970s where developers hand wrote SQL queries as an one-off imperative programs. This restricts the scope of developers that can inspect their models, and limits the complexity and scale at which model inspection can be performed.

Part 2: Inspecting a PacMan Agent w/ DeepBase

A random maze from

Let’s see how DeepBase can help inspect a NN that plays PacMan. Rather than writing hundreds of lines of code, each analysis merely requires writing or editing a handful of lines of code.

The PacMan agent model is a CNN trained via reinforcement learning. At every step, the model takes a 15x19 pixel map as input, and returns PacMan’s next move (←↑→↓). The map labels pixels as roads with (blue) and without (black) edible dots, the PacMan, the ghost, and a power pill that temporarily lets PacMan eat ghosts.

2.1 Does PacMan See Ghosts?

Suppose that we have developed a reasonable agent that appears to work well, but it’s not clear what it is doing. It can help inspire confidence if we knew whether units in the model actually use the ghost’s location to make decisions. You decide that a unit does so if it highly activates for the pixels surrounding the ghost and for nothing else.

DeepBase lets NN developers directly specify the salient parts of a DNI analysis, and abstracts away the details related to data management, execution, storage, and optimization.

The crux is to specify the hypothesis function f_ghost that takes the pixel map as input and returns a binary matrix of the same size and indicates the location of the ghost with a 1. She then states that all units in layers 1 and 2 should be analyzed, and to compute the affinity score as a Jaccard similarity (called IoU in NetDissect) between a unit’s activation matrix and the binary matrix. Note that if we had a facial recognition, we could similarly check whether units learn to focus on eyes or ears by annotate the corresponding pixels in a hypothesis function.

# Load the agent's model
models = [torch.load(“pacman”)]
# Load a test dataset
data = pickle.load(open("dataset.pkl"))
# Implement a hypothesis function f_ghost.
hyp = [f_ghost]
# Specify which hidden units and layers to inspect.
# We care about all units ('*') in layers 1 and 2
units = [(1, '*'), (2, '*')]
# How to quantify affinity between hypotheses and hidden units.
# Intuitively, if a hidden unit activates highly around
# the ghost's location, then the score should be high,
# and lower otherwise. LocalIOU is such a score.
scores = [metrics.LocalIOU()]
deepbase.inspect(models, hyp, scores, data, units)

That’s it!

DeepBase outputs a score for every hidden unit in layers 1 and 2. Plotting these below, we see that the hidden unit 14 in layer 1 has a high score. When visualizing the unit’s activations (also extracted by DeepBase), we can confirm that, indeed, it focuses on the ghost:

2.2 When Does Learning Happen During Training?

We can now deepen the analysis and ask further questions. For instance, we may wonder when this “Unit 14” detector emerged during the training process. We do so by only changing 2 lines of code! The first loads snapshots of the agent’s model that are saved at different training epochs. The second simply specifies that we only care about unit 14 in layer 1.

# Load the models saved after each training epoch
models = [torch.load(pacman_training_checkpoints)]
# Focus on hidden unit 14 in layer 1
units = [(1, 14)]
deepbase.inspect(models, hyp, scores, data, units)

DeepBase outputs the score for unit 14’s IoU score throughout the training process. When plotting the results, we can see that its IoU converges around epoch 50.

2.3 Does PacMan Anticipate?

As a final example, what if we asked whether any hidden units correctly activate along a trajectory of where ghost will be?

This may suggest that the agent is encoding state to anticipate the ghost’s actions. To do so, we simply implement hypothesis functions all we do is use add hypothesis functions f_ghost_N that simulate the ghost’s future positions after N∈[1, 8] steps. Everything else stays exactly the same!

# Hypotheses for the ghost location in 1, 2, 4, or 8 steps
hyp = [f_ghost_1, f_ghost_2, f_ghost_4, f_ghost_8]
deepbase.inspect(models, hyp, scores, data, units)

In fact, we could ask may other questions — is this detector unique? Would the results be consistent if we changed the model’s architecture? Are they consistent as the models keeps playing? DNI’s declarative API lets users easily mix and match hypotheses, scoring functions and models.

Part 3: What’s Next?

DeepBase is the first declarative system to support Deep Neural Inspection.

Making DNI analysis easy to express is only one side of the coin. A major challenge is how to make DNI fast and scalable. Our next blog post will present DeepBase’s internals and our efforts to scale DNI.

This project is thanks to a group of great contributors, including Thibault Sellam, Yiru Chen, Yiliang Shi, Boyuan Chen, Ian Huang, and Carl Vondrick at Columbia University; Kevin Lin at AI2 (and soon at Cal); and our past intern Michelle Yang who is now at 2Sigma.


  • [Zhou et al ICLR 2015] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. Object detectors emerge in deep scene cnns. arXiv preprint arXiv:1412.6856, 2014.
  • [Zeiler et al ECCV14] M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. In European conference on computer vision, pages 818–833. Springer, 2014.
  • [Raghu et al. NIPS17] M. Raghu, J. Gilmer, J. Yosinski, and J. Sohl-Dickstein. Svcca: Singular vector canonical correlation analysis for deep learning dynamics and interpretability. In Advances in Neural Information Processing Systems, pages 6076–6085, 2017.
  • [Radford et al. arXiv17] A. Radford, R. Jozefowicz, and I. Sutskever. Learning to generate reviews and discovering sentiment. arXiv preprint arXiv:1704.01444, 2017.
  • [Netdissect Lite]
  • [Kim et al. ICLR18] B. Kim, J. Gilmer, F. Viegas, U. Erlingsson, and M. Wattenberg. Tcav: Relative concept importance testing with linear concept activation vectors. arXiv preprint arXiv:1711.11279, 2017.
  • [Belinkov et al. ACL17] Y. Belinkov, N. Durrani, F. Dalvi, H. Sajjad, and J. Glass. What do neural machine translation models learn about morphology? ACL 2017, 2017.
  • [Bau et al. CVPR17] D. Bau, B. Zhou, A. Khosla, A. Oliva, and A. Torralba. Network dissection: Quantifying interpretability of deep visual representations. CVPR 2017.
  • [Shi et al. EMNLP16] X. Shi, I. Padhi, and K. Knight. Does string-based neural mt learn source syntax? In EMNLP, 2016.