A Review of DeepFool: a simple and accurate method to fool deep neural networks

By Eric Watson and Adrian Morgan

Adrian Morgan

Published in

Machine Intelligence and Deep Learning

12 min readMay 2, 2022

Project Presentation

Project Repository

https://github.com/ewatson2/EEL6812_DeepFool_Project

Introduction

Deep neural networks (DNN) have achieved state of the art performance for image classification tasks. However, state of the art DNNs have vulnerabilities to adversarial attacks. They are susceptible to small perturbations that are made in a meaningful way, which is not random noise. From the paper DeepFool: a simple and accurate method to fool deep neural networks, the authors Moosavi-Dezfooli et al. [1] introduce an efficient method to compute the minimal perturbations necessary to cause a classifier to misclassify an image, which they call DeepFool.

The authors of DeepFool also introduce a method of evaluating the robustness of adversarial perturbations, which can be used to compare results with other adversarial attack methods. The authors also perform an experiment to improve the robustness of DNNs by training on adversarial images.

A few examples are shown below in Figure 1. In the top row, an original image is determined to be a whale by a classifier. In the middle row, the original image of a whale has been perturbed resulting in the classifier determining it to be a turtle. The perturbed image is very subtle and is imperceptible to the human eye. The last depicts a similar perturbed image, but it has been perturbed with the Fast Gradient Sign Method (FGSM) proposed by Goodfellow et al. [2], which is the primary comparison method in the paper. FGSM was able to misclassify the original image as a turtle as well, but the amount of perturbations required is higher, which can be visually seen.

Figure 1: Example of Adversarial Perturbations (Deepfool vs. FGSM)

DeepFool for Binary Classifiers

Figure 2: Adversarial Example for Binary Classifier

Before the authors of DeepFool explain their algorithm for multi-class classifiers, they start off using a simply binary classifier as shown in Figure 2. From observing the figure, it is seen that in order to obtain the minimal perturbation for the classifier of 𝑓 to misclassify the input image of x, the perturbation of r*(x) must project the input image of x orthogonal to the hyperplane of the binary classifier.

Figure 3: DeepFool Equation to Calculate the Perturbation for a Binary Classifier

The DeepFool algorithm computes the perturbation of r*(x) by having the output prediction of 𝑓(x_0) divided by the L2-norm of the computed gradient ω of the loss function, giving the scalar for the perturbation. This is then multiplied by the unit vector of ω using L2-norm and finally has the sign inverted so the loss of the classifier 𝑓 is increased, as shown in Figure 3. Using the idea of models being linear, the authors devised this simple algorithm, but misclassification is not guaranteed since models are non-linear in nature. To alleviate this issue, the algorithm works iteratively and adds the previous perturbation to the next perturbation, which is performed until the label changes or max iterations are reached, as shown in Figure 4. Additionally, convergence could end up close to zero, so a parameter called overshoot η is used as a scalar (1 + η) for the perturbation of r*(x), so it would go past zero and make the class label change.

Figure 4: Pseudocode of DeepFool Algorithm for Binary Classifier

DeepFool for Multi-Class Classifiers

Figure 5: Adversarial Example for Multi-Class Classifier

Extending from the DeepFool algorithm implemented for binary classifiers, the DeepFool algorithm for a multi-class classifier can be considered multiple binary classifiers, as shown in Figure 5. The decision boundary as shown in Figure 5 can be considered a polyhedron, which is made from the intersection of multiple hyperplanes. The minimum perturbation needed would be from the closest hyperplane to x_0. Given there are multiple classes, the loss and backpropagation would need to be computed for each class label allowed by the function. When finding the minimum perturbation, the difference must be taken between the computed values for each label and the computed values for the label of the original prediction. This is shown in Figure 6, in which the index for the minimum perturbation is returned as l(x_0).

Figure 6: DeepFool Equation to Find the Index for the Minimum Perturbation for a Multi-Class Classifier

After the index of l(x_0) is determined, the minimum perturbation of r*(x_0) is computed similarly to the binary case shown in Figure 3. The only difference to the binary case is that it requires the differences of the classifier output and the gradients of the outputs, as well as taking the absolute value of the model output, as shown in Figure 7.

Figure 7: DeepFool Equation to Calculate the Minimum Perturbation for a Multi-Class Classifier

After the perturbation is computed, it would be added to the previous perturbation of r*(x_0), which is then multiplied by the overshoot scalar of (1 + η) and added to the input image of x_i. This process is repeated until the max iterations are reached or the output prediction of k_i from the image of x_i is not equal to the original output prediction of k_0 from the original image x_0. The pseudocode for the DeepFool multi-class algorithm is shown in Figure 8.

Figure 8: Pseudocode of DeepFool Algorithm for Multi-Class Classifier

Fast Gradient Sign Method (FGSM)

The Fast Gradient Sign Method (FGSM) by Goodfellow et al. [2] was used as comparison in the DeepFool paper. The algorithm works by taking the inputs of the classifier 𝑓, batch images of x, and batch labels of y. The input of x is forwarded to the classifier of 𝑓, giving the prediction of 𝑓(x), which is then used for the cross entropy loss function with the batch labels of y. Backpropagation of the loss function is performed, and the gradients of ω are obtained. The sign of the gradients are then multiplied by the scalar of ε, which controls the magnitude of the perturbations, which are then added to the batch images of x. With only using one iteration, and having the ability to compute the perturbations for a batch of images at once, this allows the FGSM method to perform faster than the DeepFool method. However the algorithm is not designed to find the minimal perturbations, so it would result in larger perturbations and also requires manual adjustment of the parameter of ε to get desired results.

Datasets

The following datasets used in the DeepFool paper are MNIST, CIFAR-10, and ILSVRC2012. The training and validation sets of MNIST and CIFAR-10 were used for adversarial training and evaluation, while the validation set of ILSVRC2012 was used only for adversarial evaluation. The specifications of the datasets are the following:

MNIST Dataset

Has 28x28 grayscale images of hand-written digits
Has 60,000 training images and 10,000 validation images
10,000 images were split for validation from training, and the original validation images were used for testing
Dataset was normalized along with random horizontal flipping of images for training/validation

CIFAR-10 Dataset

Has 32x32 RGB images of airplanes, automobiles, birds, cats, deers, dogs, frogs, horses, ships, and trucks
Has 50,000 training images and 10,000 validation images
10,000 images were split for validation from training, and the original validation images were used for testing
Dataset was normalized along with random horizontal flipping and cropping of images for training/validation

ILSVRC2012 Dataset (Validation)

Has 1000 classes and 50,000 validation images
Images were resized to size of 256 and cropped to size of 224 for GoogLeNet model
Dataset was normalized

Models

Replication was attempted for the models used in the DeepFool paper, with two different models being used for each dataset. The models used are the following:

LeNet-5 Model for MNIST

Uses two convolutional layers with kernel size of 5 and stride of 1
Uses two max pooling layers with kernel size of 2 and stride of 2
Uses two linear hidden layers with sizes of 120 and 84
Uses ReLU for activation functions

FC-500–150 Model for MNIST

Uses two linear hidden layers with sizes of 500 and 150
Uses ReLU for activation functions

Network-In-Network Model [4] for CIFAR-10

Uses the following GitHub [5] implementation using PyTorch
This model does not use linear layers and replaces them with convolutional layers followed by global average pooling
Using a mini-network instead of linear layers helps reduce overfitting and improves accuracy

LeNet-5 Model for CIFAR-10

Uses three convolutional layers with kernel size of 5, stride of 1, and padding of 2
Uses three max pooling layers with kernel size of 2 and stride of 2
Uses three linear hidden layers with sizes of 400, 120, and 84
Uses ReLU for activation functions

GoogLeNet for ILSVRC2012

State of the art model that ranked high in object detection and classification categories for the ILSVRC2014 challenge
Uses pre-trained weights that were trained on ImageNet

CaffeNet for ILSVRC2012

Due to time constraints, this model was not implemented for the project

DeepFool Experiment Setup

The author devised a way to evaluate the robustness of a classifier to adversarial attacks. The average robustness ρ of a classifier 𝑓 is calculated by taking the magnitude of the minimal perturbation r(x) per each image x, taking the sum of all of those values then dividing by the absolute value of all the test images D.

This equation is used to evaluate multiple classifier models and datasets against other methods of adversarial attacks, shown in Table 1 below.

DeepFool Experiment Results

The results show that DeepFool creates adversarial images with the lowest perturbation, but requires more computation time when compared to FGSM. FGSM is able to create adversarial images fast, but it has more perturbations than DeepFool. DeepFool requires more computation time due to its iterative process in order to find the minimal perturbation. Compared to DeepFool, FGSM computes the perturbations for one iteration, which allows it to be fast in exchange for not guaranteeing full misclassification. Also shown is the methods described in the paper Intriguing Properties of Neural Networks [3], which is also able to find small perturbations that rivals DeepFool, but it requires extensive computational time.

Table 1: Adversarial robustness comparison

The models were trained on the adversarial images to improve robustness of the models. In Table 2 below, the results of the test error after adversarial training is shown. The models were trained for five epochs at half of the original learning rate. The table shows a training accuracy comparison between DeepFool and FGSM. The training accuracy for DeepFool increased for all models, while FGSM accuracy decreased. The accuracy decreased for FGSM most likely due to it producing too many perturbations.

Table 2: Adversarial training test accuracy

The following models of LeNet for MNIST, FC-150–50 for MNIST, NIN for CIFAR-10, and LeNet for CIFAR-10 are trained on the adversarial images computed by DeepFool and FGSM. The graph in Figure 10 below shows the result of the adversarial training on the models.

Figure 10: DeepFool Adversarial Training Results

In the previous examples, it can be seen that DeepFool robustness performs much better than FGSM when training on adversarial images. This is most likely due to FGSM producing vastly more perturbations than DeepFool, which may lead to images that are not likely to appear in the dataset. To test this hypothesis, a constant scalar α is multiplied by the minimal perturbations created by DeepFool to evaluate how the robustness responds. In Figure 11 below, it depicts the robustness dropping dramatically for each increment of the value α. This result confirms that training on highly perturbed images decreases overall robustness of models.

Figure 11: DeepFool Adversarial Robustness Results

Project Setup

The author’s experiment was reimplemented to test the same models used in the paper, excluding CaffeNet due to time constraints. The models were trained and evaluated with a Nvidia RTX 2070 Super graphics card. A batch size of 128 for training and a batch size of 64 for evaluation was used. The MNIST and CIFAR-10 models were trained on 50 epochs, while the NIN model was trained on 100 epochs. The MNIST and CIFAR-10 models were optimized using stochastic gradient descent (SGD) with a learning rate of 0.004 and momentum of 0.9.

Project Results

FC-500–150 for MNIST dataset adversarial images generated from DeepFool and FGSM are shown below in Figure 12. It depicts the original image with a label of a “7” being perturbed by both methods to fool the classifier to label it as a “5”. Similarly, an image with label “3” was misclassified as an “8”. As expected, the perturbations generated from DeepFool are lower than the FGSM perturbation.

Figure 12: FC-500–150 for MNIST Misclassification Examples of a “7” and “3”

LeNet for CIFAR-10 dataset adversarial images generated from DeepFool and FGSM are shown below in Figure 13. It depicts the original image with a label of a “horse” being perturbed by both methods to fool the classifier to label it as a “dog”. Similarly, an image with the label “dog” was misclassified as a “cat” for FGSM and a “frog” for DeepFool. As expected, the perturbations generated from DeepFool are lower than the FGSM perturbation.

Figure 13: LeNet for CIFAR-10 Misclassification Examples of a “horse” and “dog”

GoogLeNet for ILSVRC2012 dataset adversarial images generated from DeepFool and FGSM are shown below in Figure 14. It depicts the original image with a label of a “ballplayer” being perturbed by both methods to fool the classifier to label it as a “baseball”. Similarly, an image with the label “four-poster” was misclassified as a “quilt” for FGSM and a “komondor” for DeepFool. As expected, the perturbations generated from DeepFool are lower than the FGSM perturbation.

Figure 14: GoogLeNet for ILSVRC2012 Misclassification Examples of a “ballplayer” and “four-poster”

Our results for adversarial robustness are shown below in Table 3 side by side with the author’s results for comparison. It shows similar results to paper in that DeepFool produces adversarial images with less perturbations than FGSM. The time required to create each adversarial image is vastly faster than the paper’s results. This is due to using a higher performance GPU, since the hardware for the DeepFool paper was an Nvidia GTX 750 Ti graphics card.

Table 3: Adversarial Robustness Re-implementation Results

Our results for test accuracy after training on adversarial images are shown below in Table 4. DeepFool did not perform as expected to improve adversarial robustness. This is most likely due to adversarial images being generated per each batch using the updated weights, instead of generating all adversarials at the start of each epoch using previous weights of the model. FGSM test error was shown to decrease for adversarial training, especially for the MNIST dataset. However, for a more complex dataset like CIFAR-10, it decreased test error, but also increased test error for non-perturbed images, which can be considered a downside of adversarial training.

Table 4: Adversarial Training Re-implementation Results

Conclusion

DeepFool finds the minimal perturbations necessary to create adversarial images to fool image classifiers. It is computed by having the minimum perturbations added to an image so the classification label changes. The results show that models can be trained on adversarial images to improve robustness. A caveat is that DeepFool requires knowledge about the model and its parameters in order to attack it, which is also known as a white-box attack. Under normal conditions, a real adversarial attack would not have any access to the model, however this method can be used to help evaluate robustness of models. Additionally, image compression algorithms, such as JPEG, can potentially compress the minimal perturbations, resulting in a failure to fool the model.

References

S. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, DeepFool: a simple and accurate method to fool deep neural networks. arXiv, 2015. doi: 10.48550/ARXIV.1511.04599.
I. Goodfellow, J. Shlens, and C. Szegedy, Explaining and Harnessing Adversarial Examples. arXiv, 2014. doi: 10.48550/ARXIV.1412.6572.
C. Szegedy, Intriguing properties of neural networks. arXiv, 2013. doi: 10.48550/ARXIV.1312.6199.
M. Lin, Q. Chen, and S. Yan, Network In Network. arXiv, 2013. doi: 10.48550/ARXIV.1312.4400.
https://github.com/jiecaoyu/pytorch-nin-cifar10