Summary: Network Dissection: Quantifying Interpretability of Deep Visual Representations (CVPR 2017)

Dheeru Dua
UCI NLP
Published in
3 min readJan 15, 2019

Authors: David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, Antonio Torralba

The contributions in this paper are two-fold, first is the development of a BRODEN (Broadly and Densely Labeled) Dataset and a mechanism to quantify interpretability of a model. Their method only applies to convolutional filters.

  1. The dataset is created by augmenting a range of objects, scenes, textures and materials from a variety of datasets.
  2. Interpretability unit scoring computation is done with help of Intersection-over-Union (IoU). First, the top quantile level T for each activation in the activation map A is obtained with P(A>T) = 0.005. The low-resolution activation map A is then, scaled to S so it can be compared to the annotated target image containing object annotation for each pixel. A binary segmentation M is obtained by threshold S ≥ T. IoU(k,c) is computed for each object as below where Mk is the matching obtained above and Lc is target annotation for each object c.

In order to evaluate whether it is meaningful to assign an interpretable concept to an individual unit, the paper proposes two hypothesis

Hypothesis 1. Interpretable units emerge because interpretable concepts appear in most directions in representation space. If the representation localizes related concepts in an axis-independent way, projecting to any direction could reveal an interpretable concept, and interpretations of single units in the natural basis may not be a meaningful way to understand a representation.

Hypothesis 2. Interpretable alignments are unusual, and interpretable units emerge because learning converges to a special basis that aligns explanatory factors with individual units. In this model, the natural basis represents a meaningful decomposition learned by the network.

Consider 5th convolutional layer, f(x), in AlexNet of size 256. The rotation Q is drawn uniformly from SO(256) by applying Gram-Schmidt on a normally-distributed QR = A with positive-diagonal right-triangular R. The authors found that if they perform Qf(x), the number of unique detectors (i.e, IoU) falls by 80%. This shows inconsistency with Hypothesis 1.

They also perform minor perturbations to the basis vector by computing fraction powers Q^a, 0<a<1. Gradually rotating the space from I to Q (using Schur’s decomposition), they found that the IoU reduced. The sample rotations are given below. If you would like to play with them more, you can try them here.

Despite the drop in intrepretability, the discriminative power of the neural net still remian rotation invariant. Let consider a net g(f(x)), which can be rotated as g’(r) = g(transpose(Q) r). The rotated input r=Q f(x) when passed through this net results in g(transpose Q f(x) Q) = g(f(x)) same as before showing unchanged discriminative power.

Another observation from the human experiments showed that the human agreement rate on objects was higher on the feature output from the later layers than the earlier layers in the networks. This makes sense, as the lower layers just pick aspects like color, texture, however, the later layers capture more abstract concepts like objects.

Conclusion: Overall the approach is interesting but is limited to only convolutional networks.

--

--