Review of “Robust Physical-World Attacks on Machine Learning Models”

Eric Maggard
Aug 28, 2017 · 7 min read

I read the research report that purported to create a robust algorithm that takes into account varying conditions. Below is the abstract from their paper.

Abstract — Deep neural network-based classifiers are known to be vulnerable to adversarial examples that can fool them into misclassifying their input through the addition of small-magnitude perturbations. However, recent studies have demonstrated that such adversarial examples are not very effective in the physical world — they either completely fail to cause misclassification or only work in restricted cases where a relatively complex image is perturbed and printed on paper. In this paper we propose a new attack algorithm — Robust Physical Perturbations (RP2) — that generates perturbations by taking images under different conditions into account. Our algorithm can create spatially constrained perturbations that mimic vandalism or art to reduce the likelihood of detection by a casual observer. We show that adversarial examples generated by RP2 achieve high success rates under various conditions for real road sign recognition by using an evaluation methodology that captures physical world conditions. We physically realized and evaluated two attacks, one that causes a Stop sign to be misclassified as a Speed Limit sign in 100% of the testing conditions, and one that causes a Right Turn sign to be misclassified as either a Stop or Added Lane sign in 100% of the testing conditions.

They created an algorithm that they claim did the following:

  • Stop Sign misclassified as a Speed Limit 100% of the test cases
  • Right Turn misclassified as a Stop or Added Lane 100% of the time.

Test Images

In Figure 1, I show the images that I cropped out of article. I tried to get an image with a straight on view and a perspective view. These images were taken directly from the article and have not been altered in anyway.

Figure 1: Sample images from the article on Physical-World Attacks on DNN models

One thing that struck me at first was the poster printing attack image was very dark and had less contrast than the other images. The poster printing images are the two images on the far right. When I looked at the images in separate R, G, and B color channels the reason for the lack of contrast became apparent. Figure 2 shows the separate RGB channels for one of the poster printed images.

Figure 2: RGB channels of the overlay attack image

The one thing that stands out is that much of the variability and loss of contrast is in the red color channel. It then makes a lot of sense that if the DNN model uses color, it would loose a lot of information about the sign with the loss of the red channel. One point of interest would be to see how models that convert images to grayscale would handle this image. The paper showed that this editing resulted in misclassification 100% percent of the time for the first selection. However, in 8 out of the 15 views evaluated, the second classification choice was the stop sign. So even though the correct classification was in the top 2, they chose to count that as a failure which I disagree with.

Attack Results

Table 1 show the results of the classifier for the various viewing distances, angles and attacks for the stop signs.

Table 1: Summary of Targeted Physical Perturbation Experiments

One interesting thing to note is the number of times the stop sign result is in the top two results. From this it leads me to think that this attack is not as robust as the authors are claiming. Also, the camouflage/graffiti attack had the Stop Sign in the top two in all but one case. Therefore, that attack was lacking in robustness.

Initial Test Results

I took two cropped images from each of the three different attack modes and ran them through my preprocessing and normalization steps and then through my trained DNN model. The model was trained on the European sign training sets, and therefore did not have any of the US signs or attack images in the training data. Figure 3 shows the results from the initial classification test.

Figure 3: Results of the first evaluation of the DNN

As you can see from the top 3 guesses, the Love/Hate sign attack was correctly identified for each of the two test images and had a very high confidence. The poster printing attack misclassified each of the two images and didn’t have the stop sign in the top three guesses. Also, the confidence percentages were low for the first two guess, which shows that it didn’t have high confidence with those guesses. The camouflage art attack had a high confidence for one of the images and a 55% confidence for the other. Even though they were misclassified, the stop sign was in the top three guesses.

New Training

I then wanted to see how robust this attack would be if one or two of those images were included in the training set. I cropped two new images of the camouflage art attack images from the paper and added them to the test set. I took the German Traffic Sign Benchmark images and training sets to retrain my DNN model. I expanded the training set by rotating, perspective warping, and lightening/darkening random images. In this set, for each 10th randomly selected image, I took one of the two attack images and applied the same perturbations I applied to other random images and added those new images to the training set. Figure 4 shows the classes and number of images in each set before and after the expanding process. The stop sign class was ID 14 and had 690 images before and 1401 images after expanding the dataset. I estimate around 36 attack images were added to the training set.

Figure 4: Training dataset before and after image manipulation expanding

The neural net being trained was a modified LeNet Architecture illustrated in Figure 5.

Figure 5: modified LeNet Model Architecture

Training the model I used a batch size of 128 and 15 epochs. I trained with an Adam Optimizer with a learning rate of 0.0015. Training for the 15 epochs resulted in a final training accuracy of 99.7%, a validation accuracy of 95.6%, and a test set accuracy of 94.1%.

Test Results with attack images

After the new model was trained, I took the previous test images and ran them through the new model. Figure 6 shows the output of the classifier with the new attack images included.

Figure 6: New Model classification results

As you can see from the figure, with the new attack images included in the model the classifier was able to guess the correct class in 5 out of the 6 images. It is also important to note that the one it misclassified, the confidence is low for the first one, and the third guess is the correct one with a estimate of 9%.

Conclusions

I think that the paper was very misleading in that they only looked at the first guess of the classifier and said that it was wrong, even though the second was correct. Also, they only looked at how their trained classifier worked but didn’t venture out to Waymo, Uber or others to evaluate their classifiers and the robustness of their method. Their trained classification result of 91% on the test set was low and their model should have been improved until a result of at least 95% was obtained. Some of the best classifiers on the GTSRB testset is in the upper 98 to the lower 99 percentile [2]. One other issue I have with their training set is that it was limited to less than 500 images for the large classes and other classes had as few as 92.

The other major issue I see, and my retraining results illustrate, is that when stop signs are misclassified because of tampering, future results can be improved by including those images into the training set. Once that is done and the model can generalize to those new signs, the mode of attack will need to change. This leads me to conclude that their attack algorithm is not as robust as they claim.

This and other attacks on the vision system of autonomous cars might have some effectiveness in real world operation. However, it will be a long time, if ever, that autonomous cars rely solely on vision. Other sensors such as sonar or LIDAR will assist the autonomous cars navigate and sense the world around them. There is also SLAM which locates the car on a map and therefore the autonomous car will have speed limits, crossings, and stop sign so it will not be completely dependent on vision.

References

[1] Robust Physical-World Attacks on Machine Learning Models, https://arxiv.org/pdf/1707.08945.pdf

[2] Results from the GTSRB dataset: http://benchmark.ini.rub.de/?section=gtsrb&subsection=results

)
Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade