Is it a bird or a plane? — One Pixel Attack for Fooling Deep Neural Networks

Published in

Machine Intelligence and Deep Learning

9 min readMay 1, 2022

Note: You can find our video presentation on YouTube.

Modifying a pixel in the image changes the classification prediction of a Neural Network

Neural networks have been used to perform various machine learning tasks, among which image classification is one such task where convolution neural networks(CNN), a class of neural networks, have found great success. With CNNs such as AlexNet and VGG16 achieving human-competitive results, deep neural networks are the current de-facto standard for image classification. With accuracy being the main criteria, robustness of such neural networks were ignored. While at first glance, such a design choice may seem inconsequential, the findings of a group of researches described in the paper introduces a cause for concern.

One-Pixel Attack

Adding upon previous works where introducing a small amount of noise that is supposedly imperceptible to the human eye causes neural networks to misclassify images, the one-pixel attack aims to achieve the same while being limited by an extreme constraint, modify only a single pixel of the image. This is a semi black-box attack as it does not need any other information of the model except the final probability labels of each class. The problem statement can be mathematically defined as an optimization problem

Optimization problem to cause misclassfication

Here, fₜ(x) is the probability of an image x = (x₁, …, xₙ), to be classified as class t and e(x) = (e₁, …, eₙ) is the additive perturbation to the each of the n pixels of the image. The constraint here is that the overall perturbation amount is limited to L. However, the authors used a different approach by modifying the constraint to limit the number of pixels that can be changed.

Modified optimization problem with constraint on number of pixels that can be perturbed

The modified constraint is that the L1 norm of the perturbation vector, e, is limited to a small positive integer d. Solving this optimization problem will allow the authors to achieve misclassification by modifying a number of pixels . While gradient descent is the most common method of solving optimization problems, given the authors use of L1 norm in their constraint, the equation is not differentiable making the use of gradient descent unfeasible. Additionally, the authors performed a targeted attack on the CIFAR-10 dataset which tries to trick the neural network to misclassify an image as a specific class and an untargeted attack on ImageNet dataset which tricks the neural network to misclassify the image as any other class.

Differential Evolution algorithm

Visualization of DE optimizing a 2D Ackley Function

Differential Evolution (DE), a population-based algorithm, is a class of evolutionary algorithms that can solve complex multi-modal optimization problems. This algorithm works by initializing a population of candidate solutions and at every iteration of the algorithm, the candidate solutions are modified and compared against their corresponding parent solutions and the winner survive for the next iteration. Each candidate represents a perturbation that modifies a pixel and is a tuple holding five elements: x-y coordinates and the RGB values of the perturbation.

List of candidate solutions where d is the number of candidates

The authors approach to find a solution for the optimization problem includes the following steps

Randomly initialize the x-y coordinates of each candidate from the uniform distribution of the image dimensions in the dataset such that xᵢ, yᵢ ~ U(1,32) for Kaggle CIFAR-10 and xᵢ, yᵢ ~ U(1,227) for ImageNet.
Randomly initialize the RGB values according to the normal distribution with mean and standard deviation being 128 and 127 respectively such that rᵢ, gᵢ, bᵢ ~Ν(μ = 128, σ = 127)
To evaluate the effectiveness of the perturbation, we use the fitness function which is the probability label of target class for targeted attack using Kaggle CIFAR -10 and probability label of the true class for a non-targeted attack using ImageNet
For each iteration, g, of the algorithm, the candidate solutions are evolved such that

5. The evolved candidates are compared to their respective parent candidates and the candidate that scores higher on the fitness function is retained while the other is eliminated

6. This process continues until either 100 iterations of the algorithm are performed or terminal conditions are reached which is
fitness function = probability label of target class > 90% (targeted attack) or
probability label of true class < 5% (non-targeted attack)

Datasets

Kaggle CIFAR-10: Kaggle’s CIFAR-10 dataset is a collection of 300,000 32x32 images that spans over 10 classes and are taken from the original CIFAR-10 with modification such as duplication, rotation, clipping and blurring. These modifications were made using an undisclosed algorithm with the purpose of introducing noise and variation in the images.
ImageNet: This dataset spans 1000 object classes and contains over 1,400,00 images of dimension 227x227. This dataset was used to evaluate the attacks efficacy on high resolution images.

Neural Network Models

Model architecture of Neural Networks used in the paper

Vgg16: It is a very deep convolution neural network submitted to ILSVRC-2014 and achieved a classification accuracy of 92.7% putting it in the top-5 test accuracy in ImageNet at the time of submission.
Network in Network: The NiN model avoids fully connected layers and uses multiple NiN block with number of output channels equal to the number of label classes followed by global pooling layer. The main advantage of NiN’s design is that it significantly reduces the number of required model parameters at the expense of longer training time
All Convolution Network: A convolution network with fewer layers whose strength lies in its simplicity.

Illustration of AlexNet’s architecture. Image credits to Krizhevsky et al., the original authors of the AlexNet paper.

AlexNet: AlexNet was one of the first networks to use the ReLU activation and was capable of being trained on a multi-GPU setup significantly reducing the training time. In 2012 ImageNet competition, AlexNet achieved a top-5 error rate of 15.3% winning the competition with a large margin compared to second place top-5 error rate of 26.2%.

Using such a diverse selection of models with varying characteristics to evaluate the efficacy of the generated adversarial images in misclassification helped the authors to properly analyze the attack and understand the pitfalls of neural network classifiers.

Evaluation Metrics

The metrics of highest significance used in this paper are

Success Rate: For non-targeted attack, it is the percentage of images that were successfully classified by the network as an arbitrary target class and for targeted attack, it is the probability of perturbing an image to a specific target class
Probability Labels (Confidence): The sum of the probability value of the target class for each successful perturbed image divided by the number of successful perturbations.
Number of Target Classes : The number of images that successfully perturb to a certain number (i.e. from 0 to 9) of target classes.

Findings and Discussion

Classes in black and blue are the labels predicted before and after the targeted attack on Kaggle CIFAR-10 and non-targeted attack on ImageNet respectively. The pixel that was modified is highlighted in the images

As seen from the figure, one-pixel attack can drastically change the prediction label and its corresponding confidence on both the datasets. Even a high resolution picture of a baby in a bassinet that is visually apparent to the human eye was misclassified as a paper towel by the BVLC AlexNet model when a single pixel of the image was perturbed.

Table indicating the original accuracy, success rate of targeted and non-targeted attack and the confidence of the attack over different models trained .

From the table, the success rate of the targeted attack on Kaggle CIFAR-10 indicates that around 15–25% of the adversarial images, images obtained after the one-pixel attack, were misclassified into a specific class. Moreover, a large portion of the adversarial images generated from the non-targeted attack were able to cause the networks to misclassify. Additionally, the confidence metrics of the attacks on AllConv, NiN and VGG16 is higher indicating that the attack has fooled the networks to a great extent. The main takeaway from this is that the networks trained using the conventional model training paradigm are not robust enough to withstand a perturbation to even a single pixel. Moreover, the success rate for both targeted and non-targeted attack on NiN model is higher than other models, implying that some networks can be more vulnerable to adversarial images.

However, it can be argued that the high success rate of the attack is due to the low resolution of 32x32 of the Kaggle CIFAR-10 dataset. As such, the authors used higher resolution images of size 227x227 from the ImageNet data set and about found that 16.04% of the adversarial images generated by the non-targeted one-pixel attack caused misclassification in the BVLC AlexNet model. While the success rate may seem low at first glance, considering that the search space for ImageNet is 50 times larger than that of Kaggle CIFAR-10, the attack can be considered to be effective against higher resolution images. The lower confidence for BVLC AlexNet can be attributed to ImageNet having 1000 classes compared to 10 classes of Kaggle CIFAR-10.

Success rate and confidence scores of targeted and non-targeted attack on AllConv Network trained with the Kaggle CIFAR-10 dataset

The 1-pixel attack was compared with 3-pixel and 5-pixel attack using AllConv network trained on Kaggle CIFAR-10 dataset. As expected, increasing the number of pixels perturbed significantly increased the success rate for both the targeted and non-targeted attack implying more number of adversarial images caused misclassification when more pixels were modified.

Heatmap showing number of times the attack was successful for a pair of original-target class pair in 1-pixel attack. Red and blue classes indicate the original and target class respectively. Number 0 to 9 indicate: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck respectively. The example on the right shows the original class and the misclassified class and the confidence of classification.

The heatmaps of the attack on all three models AllConv, NiN, VGG16 are approximately symmetric across the diagonal. This indicates that classes have a similar number of adversarial images create to and from those classes. For example, for the 5-3 original-target class pair, the heatmaps shows that the attack was successful multiple times implying that an image of a dog can be misclassified as a cat with higher success rate. Additionally, the 3–5 original-target class pair has equal number of successful attacks indicating that image of a cat can be misclassified as a dog with higher success rate. This also indicates that the dataset contains certain original-target class pairs that are more vulnerable to the targeted attack. However, there exists some outliers to this pattern where the model can misclassify a ship as an airplane with higher success rate but an airplane cannot be misclassified as a ship with as high of a success rate.

Comparing Differential Evolution (DE) 1-pixel attack with Random 1-pixel attack

To evaluate DE’s efficacy of solving the optimization problem, the authors compared the success rate and confidence of DE 1-pixel attack and Random 1-pixel attack, where a random pixel is selected and modified to a random value, and found that the success rate for DE is significantly higher compared to the random attack. This shows that DE has found a better solution to the optimization problem for all the networks compared to random attack proving its ability to solve non differentiable optimization problems.

Conclusion

Although it has been known that neural network based classifier can be fooled by these adversarial images where noise was added to the image, this paper is the first to show that by changing just a single pixel of the image to a specific value, the neural network can be fooled. Moreover, this paper shows that evolutionary based algorithms that are computationally inexpensive, can effectively be used in the context of neural networks and deep-learning to solve optimization problems.

Written and Presented by:

Sanjay Gandham
Nicholas DeSantis