1. What are Adversarial Examples and why they exist — ML Shorts 🤖🤏

Short explainer on adversarial examples.

Anubhav Tiwari
ExpNotes
4 min readApr 8, 2021

--

What is it?

In Simple Words :

Any input to a machine learning model that is altered in such a way, that it is still perceived by a human as the original example, but when fed to a machine learning model, causes the model to misclassify or make false predictions.

Fig : Commonly quoted example of adversarial examples from (https://arxiv.org/abs/1412.6572). Here, the original example is an image of a Panda, When some noise is added to it, the model classifies it as a gibbon, while the image still looks like a Panda to a Human.

Why They Occur?

Common machine learning pipeline looks like this:

Fig : Simple Machine Learning Pipeline

Here the task of model is to learn or in other words, tune its parameters according to the task at hand and the examples provided to it.

During the training, a Machine Learning model performs a subtask of learning high level feature representations of the examples, which proves useful in achieving better performance on the task at hand.

The model learns an n-dimensional space with similar examples clustering close to certain regions. This creates high density regions of probability distribution for the examples. These regions are called class manifolds or simply manifolds.

It is found through experiments that most examples lie very close to the separation boundary of different classes manifolds.

So if the model’s activation for a certain example were supposed to lie in a region in the n-dimensional representation space such that it can be clustered with other examples of it’s class, say class 1, but due to addition of a noise, those activations are pushed towards another region, which contains examples of other class, say class 2, the model will predict the input as the class 2 as it’s representation lies in the region cluster of examples of class 2.

Why is it that the adversarial example remains perceivably similar from the original image? It is because of the closeness of the spaces of regions of different classes to the hyperplane. Most classes are concentrated very close to each other, thus a slight movement in one direction can cause false predictions.

Fig : Taken From : https://arxiv.org/abs/1812.00740 . Showing the example of manifolds and adversarial examples.

Generating An Adversarial Example :

So for an example such that Y( Xi ) = Ci

For Y being the model, and Ci being the actual prediction Ci ϵ { C1, C2, C3 …. } or Ci ϵ R

After adding a noise , Xi‘ = Xi+η , where || η ||∞ ≤ ε (infinite norm of η)

For some threshold ε .

And Y( Xi’ ) = Cj such that Cj ≠ Ci.

An example of how the noise can effect the activations :

Fig : Taken from the lecture slides for lecture 16 : Adversarial Examples And Adversarial Training CS231n at Stanford, By Dr. Ian Goodfellow. Fig. Shows changes in the logits (Unnormalized probability distribution of classes before applying softmax) as is moved from -30 to +30 in a direction.

In the image above the frog class shows the highest logits in both extreme cases and thus it can be said that the direction in which we’re moving is the direction of the frog class.

The car class (automobile) shows increase in -ve direction but not as much as that of the frog class.

The car logits are the highest near 0 epsilon, this shows that the model correctly classified without any noise as this region might be for the car class, but as soon as we move in -ve or +ve direction, other classes (majorly frog class) is more likely and the region might be of the frog class in either direction for that direction.

This is because most models are linear or piecewise linear, thus the mapping from input to output is highly linear and creates stronger boundaries which can lead to easy misclassifications.

Some great insights about the properties of adversarial examples can be found in this amazing paper : https://arxiv.org/abs/1711.02846

More details about different types of adversarial example generation and defense strategies can be found in the amazing survey papers from 2018 and 2019 here :

  1. Survey Paper 2018 : Adversarial Examples: Attacks and Defenses for Deep Learning
  2. Survey Paper 2019 : Adversarial Attacks and Defenses in Images, Graphs and Text: A Review

Key Takeaways

Adversarial examples are still a major unsolved problem and pose a threat to the path to AGI.

___________________________________________________________________

--

--