Paper Summary: Robust Physical-World Attacks on Deep Learning Models

4 min readNov 29, 2018

Part of the series A Month of Machine Learning Paper Summaries. Originally posted here on 2018/11/28, with better formatting.

Robust Physical-World Attacks on Deep Learning Models (2017) Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, Dawn Song

This is the second paper I’m summarizing on physical adversarial attacks. Why? Well, there’s something sort of cyberpunk about it. It’s yet another example of how the world of atoms and the world of bits are starting to blur together. Compared to Athalye 2017 this paper is perhaps less technically elegant, but the test case is more incisive and the approach has interesting details.

This paper is about hiding in plain sight. Rather than attempt to make imperceptible changes, this approach merely makes innocuous changes that “hide in the human psyche,” so this is part social engineering and part technical. The choice of road signs as an attack vector is a good one: signs are visually simple, so it’s hard to hide perturbations; they’re embedded in a noisy, complex environment; and there are real-world safety implications, especially as (semi-)autonomous vehicles come into wider use.

The authors call their approach Robust Physical Perturbations (RP2), and the idea is similar to Athalye 2017. The main practical difference is that, rather than relying purely on simulation, the authors use a combination of physical and synthetic transformations, which they say picks up subtle effects not captured in simulation. (This sounds plausible, but I’d like to see some ablation studies or side-by-side comparisons.)

There are other differences too. Like Athalye 2017, they model a space of (physical and digital) transformations, but this involves actually taking images of the real physical target object from several angles, distances, and lighting conditions. These inputs are augmented with synthetic changes to brightness and additional spatial transformations. Since there’s no 3D renderer involved, they mask out the object to avoid considering physically impossible changes to the background. In what is perhaps the most interesting difference, they further identify which parts of the target object most impact the output class, and use this as a guide to manually update the mask to a small subregion. (The identification step is done with L1 regularization, which encourages sparse perturbations.) Another important difference is the way they handle fabrication error, which they do by including a term for Sharif 2016’s Non-Printability Score (NPS) in the loss function. Finally, the masked perturbation is transformed with an alignment function before being added to the input (which has already been transformed). Presumably the camera positions are carefully measured so that they can approximate the associated alignment function — I would have liked to see the details of how this was done.

The final optimization problem is

Mx is the matrix of the perturbation mask. XV is the space of transformed inputs, so xi has already been transformed. Ti is the alignment function and y∗ is the target class.

They tested using LISA-CNN, GTSRB-CNN (which had 99.4% accuracy on the stop sign only dataset), and also Inception-v3. There were two attack types: poster-printing, in which the entire sign was covered with a print-out, and sticker attacks, with graffiti-like and “camouflage art” masks. The results vary a bit, ranging from 65% success rates to 100%, depending on the experimental setup — so it basically works, but personally I’d want more consistent results before calling a method robust (“Towards Robust Physical-World Attacks…”?). One qualitative result was that targets that were visually closer to the original class were easier for the attack to hit, which is consistent with Athalye 2017’s results.

Overall I’m not as impressed with the quality of this paper, compared to Athalye 2017, but I think they do bring some interesting methods to bear and I like the choice of street signs as a target. A followup I’d be interested to see is an investigation into improving the graffiti-like appearance of these attacks, perhaps using stencils and spray paint. A natural question in that case is whether people (including law enforcement) would notice these attacks or could be trained to recognize them.

Athalye et al 2017 “Synthesizing Robust Adversarial Examples” https://arxiv.org/abs/1707.07397

Carlini and Wagner 2017 “Towards evaluating the robustness of neural networks” https://arxiv.org/abs/1608.04644

Sharif et al 2016 “Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition” https://dl.acm.org/citation.cfm?id=2978392

Paper Summary: Robust Physical-World Attacks on Deep Learning Models

Written by Mike Plotz