Deep Classifiers Ignore Almost Everything They See (and how we may be able to fix it)

Jörn Jacobsen, Jens Behrmann, Rich Zemel, and Matthias Bethge — 25.3.2019

Excessive Invariance: All images shown cause a competitive ImageNet-trained network to output the exact same probabilities over all 1000 classes (logits shown above each image).
  • Excessive invariance gives an alternative explanation for the adversarial example phenomenon
  • We identify the commonly-used cross-entropy objective as a major reason for the striking invariance we observed
  • There may be a way to control and overcome this problem …

Exploring Invariances of Learned Classifiers

Investigating what a classifier does not look at, what it is invariant to, requires access to everything the classifier throws away throughout the layers. This is hard to do in general and has been subject of extensive study (e.g. [2]). Fortunately, recent advances in invertible deep nets have led to networks that do not build any invariance up until the final layer [3,4]. As everything but the final layer is a lossless 1-to-1 mapping, projecting from invertible representation to the class scores is the only place where invariance is created. What remains, is to simplify this final layer, so we can manipulate and investigate the pre-image of particular class scores.

We split the output of the invertible network into two subspaces: Zs represents the class scores and Zn everything not seen by the classifier.
  • The nuisance subspace Zn: the remaining dimensions the classifier does not see.
Analytically Analyzing Logit Pre-images: Compute hidden representation for one image (left), throw away Zn, but keep logits Zs. Compute hidden representation for an arbitrary image from other class (right), throw away Zs, but keep Zn. Concatenate resulting Zs and Zn, invert the network and look at result!
Top row: images from which logit vectors Zs are taken. Bottom row: images from which nuisance vectors Zn are taken. Middle row: resulting inverted images with identical logit configurations as top row images. We have analytically computed adversarial examples!

We have stumbled upon an analytic adversarial attack.

The figure above shows that, despite our hopes to learn about the decision space of the classifier, Zn dominates the image completely. The classifier, represented by the information encoded into the logits, seems to be almost completely invariant to any change of the input. We have stumbled upon an analytic adversarial attack. We can swap class-content arbitrarily without changing the predicted probabilities over 1000 ImageNet classes.

How is this Related to Adversarial Examples?

It is well-known from adversarial example research, that tiny perturbations of an input can change the output of a deep network completely. This shows how deep neural networks, despite their impressive performance, exhibit striking failures on slightly modified inputs. As such, adversarial examples are a powerful tool to analyze generalization of learned models under distribution shifts.

The classical viewpoint (short orange arrow): perturbation-based adversarial examples x* apply changes to an input x such that x* stays in the same ground truth class as x, while crossing the decision-boundary (dashed line) of the model. Our alternative viewpoint (long pink arrow): invariance-based adversarial examples x* apply changes to an input x that change the ground truth class of x*, without crossing the learned decision-boundary.

Why are Deep Classifiers so Invariant?

To understand why deep classifiers exhibit the excessive invariance we have observed above, we need to investigate the loss function used to train them.

Left: Cross-entropy trained networks are easily attacked with our analytic invariance-based attack. Right: Independence cross-entropy trained model. Our attack is not successful anymore, it is only able to change the style of the digit, not its semantic content.

Main Reference:

Jörn-Henrik Jacobsen, Jens Behrmann, Richard Zemel, Matthias Bethge, “Excessive Invariance Causes Adversarial Vulnerability”; ICLR, 2019.

Postdoctoral Fellow at Vector Institute