FalseSignals: Robust Physical Adversarial Attack on Faster R-CNN Object Detector

9 min readMar 21, 2020

“With its release, the research community will be able to reproduce our results and have access to the same software platform that [Facebook AI researchers] use every day”
— Ross Gershick(talking about Faster R-CNN)
Research Scientist, Facebook AI Research
One of the creators of Faster R-CNN

Introduction

An adversarial attack comprises of subtly modifying an image in such a way that the changes are almost undetectable to the human eye. The altered image is called an adversarial image, and when submitted to a classifier is misclassified, while the original one is correctly classified. ‘Deep Neural Networks’ (DNN) robust arrival in computer vision does not mean they are safe from this vulnerability. By adding indistinguishable adversarial perturbations, the accuracy of a DNN image classifier can be brought down to almost zero percent. The existence of adversarial examples not only reveals intriguing theoretical properties of DNN but also raises serious practical concerns on its deployment in security and safety-critical systems. Autonomous vehicles are an example of an application that cannot be completely trusted before ensuring the robustness of adversarial attacks. The critical need to understand the vulnerabilities of DNNs has attracted massive interest among machine learning, computer vision, and security researchers and enthusiasts.

Even though many adversarial attack algorithms have been proposed, attacking a real-world computer vision system is challenging. A majority of the extant attack algorithms only focus on the image classification function, still, in many real-world scenarios, the algorithm would be identifying more than one object in an image. Object detection, which identifies and pinpoints multiple objects in an image, is a more suitable model for many vision-based scenarios. Since an attack on an object detector would need to deceive the classification results in various bounding boxes with different scales, it is harder than attacking an image classifier.

Another level of complexity occurs from the issue that DNN is usually only a component in the computer vision system pipeline. Usually, attackers do not have the ability to directly manipulate the factors outside of the system for many applications. To conduct a successful attack, physical adversarial attacks should be sufficiently robust to endure real-world distortions caused due to disparate viewing ranges and angles, lighting conditions, and camera constraints.

Background

In this project, we deploy a robust targeted attack that can deceive an au courant Faster R-CNN object detector. We are using the Expectation over Transformation technique and apply it from image classification to the object detection setting to make the attack more robust. In this project, we generate some adversarially perturbed traffic signals that can consistently be incorrectly detected by Faster R-CNN as the target objects in real drive-by tests.

An illustration of machine learning adversarial examples. Studies have shown that by adding an imperceptibly small, but carefully designed perturbation, an attack can successfully lead the machine learning model to make a wrong prediction. Such attacks have been used in computer vision (upper graphs) and speech recognition (lower graphs) Source: An Overview of Vulnerabilities of Voice Controlled Systems, Yuan Gong, Christian Poellabauer, Conference: 1st International Workshop on Security and Privacy for the Internet-of-Things (March 2018)

R-CNN & Faster R-CNN

R-CNN (Girshick et al., 2014) is short for “Region-based Convolutional Neural Networks”. The main idea is composed of two steps. First, using selective search, it identifies a manageable number of bounding-box object region candidates (“region of interest” or “RoI”). And then it extracts CNN features from each region independently for classification.

Faster R-CNN (Ren et al., 2016) is doing exactly this: construct a single, unified model composed of RPN (region proposal network) and fast R-CNN with shared convolutional feature layers.

Adversarial attack

Adversarial machine learning is a technique used in machine learning to deceive models by manipulating the input. This technique can be used to maliciously manipulate input to take advantage of the weak points of learning algorithms and endanger the security of the machine learning system.

In targeted attacks, the attacker tries to bluff the image classifier to get the input image classified as a specific target class, instead of the actual class.
In non-targetted attacks, the attacker enforces the model to incorrectly classify the adversarial image.

Threat Model

Current methods that create adversarial examples typically yield indistinguishable manipulations that can deceive a given machine learning model. In this project, we are creating perturbations that are perceptible but limited such that a human being would not be deceived easily by such perturbations. We study perturbation in the context of object detection (traffic lights) in this project. We selected to study traffic lights since the possible uses of the object detector in security-related and safety-related scenarios (e.g. autonomous vehicles). For instance, an attack on traffic lights recognition may cause an autonomous car to misinterpret a signal and cause accidents.

We have presumed that the attacker has white-box level access to the machine learning model. This gives the attacker the ability to compute both outputs and gradients since they would have access to the structure and weights of the model. This means the attacker does not have to create a perturbation in real-time. Instead, by studying the model the attacker can attack the model using methods like the Carlini-Wagner attack.

Attack Method

The goal of this project is to carry out a robust targeted attack that can fool a Faster R -CNN object detector model. We do this by creating adversarially perturbed images of the target object which can potentially mislead a Faster R-CNN object detector. The target image that we have chosen is the human walk sign in traffic stop light which may looks like the one below :

Given a trained machine learning model M and a benign instance x belonging to X that is correctly classified by M the goal of the untargeted adversarial attack is to find another instance

x′ belonging to X, such that M(x′) ≠ M(x) and d(x,x′) ≤ ε for some distance metric d(.,. ) and

perturbation budget ε > 0.

For a targeted attack, we require that M(x′) = y′ and y′ ≠ M(x) in the target class. The goal of the attack is to create an image x′ that looks like an object x of class y but will be classified as another target class y′.

To carry out the attack we used the iterative change of variable attack and expectation over transformation technique. Let’s quickly see what this means

Change of Variable Attack:

LF (x; y) = L(F(x); y) is the loss function that calculates the distance between the model output F(x) and the target label y. For the adversarial attack, given an input image x and target class y′,

Change of variable attack carries out the following optimization formula :

Tanh function ensures that each pixel is between [-1,1], the constant term c controls the similarity between the modified object x′ and original image x.

Expectation over Transformation:

The idea here is to add random distortions in each iteration of the optimization to make the resulting perturbation more robust. The transformation of the image can be translation, rotation, scaling etc. We can represent the transformation as follows:

Mt(Xb,Xo) where Mt is the operation that transforms the object image Xo and overlays it

Onto a background image Xb.

The above operation can be done using masking over the entire image or part of the image that

Needs to be attacked. After carrying out the random distortions, the equation that we saw above becomes,

Where X is the training set of background images. The optimization problem is solved by using gradient descent and backpropagation.

Before we go any further in the attack, let’s quickly look at how Faster R-CNN object detection works.

Faster R -CNN adopts a two-stage approach :

The convolution layer output is passed to Region Proposal Network (RPN) which outputs several regions of interest or objects.
The output of the RPN is then passed to a classifier which then classifies the image within each region proposal.

The attack is carried out on the classification in each region proposal by doing the following optimization:

Original Image:

Classification of Original Image:

The object of interest: Traffic light is correctly classified by the Faster R-CNN object classifier. To attack this we introduced perturbations in the image to fool the Faster R-CNN model after which it got classified as a person.

Perturbed Image

Image classified as a person by the same Faster R-CNN model:

Results

We have evaluated our model by fooling a pre-trained Faster R-CNN network (Inception V2) trained on the MS-COCO dataset. We used the Inception V2 pre-trained network available in the Tensorflow API Library. The MS-COCO contains objects such as animals, groceries, sign boards, persons etc. comprising 80 such general classes.

Our adversarial attack has various potential use cases but we choose to attack the traffic walk sign to fool the model to misclassify it as a person. It could be a potential threat as the evolution of self-driving cars happens and it can stop at a traffic signal even if the walk sign is on. Our attack can also be created for different object shapes and regions. This can be done by varying the mask designs used during the masking process.

Digitally Perturbed Traffic Walk Sign

We have generated the perturbed (adversarial) traffic walk signs after the optimization of the previously described optimization function. The two hyperparameters are the patch loss weight “c” and the regularization coefficient. The hyperparameter “c” determines the perturbation strength. A smaller value of c will create a more conspicuous perturbation, but at the same time, it will be more robust to the real-world distortions. Similarly, a higher value of c will create a low confidence perturbation which in turn leads to an untargeted attack.

We have used the l2 norm for the regularization, which makes it difficult to choose a value for c. Creating a robust perturbation will lead to a more conspicuous image that will confuse humans and makes it difficult for them to recognize the image. Also, the l2 distance is sensitive to color changes on lighter objects. Therefore, we only change the background of the walk sign.

We used c=0.005 for a low confidence perturbation and c=0.002 for a more robust perturbation. We can observe that a more robust perturbation leads to a more conspicuous image, so we need to find a perfect value of c according to the use case. In our experiment, it can be observed that our adversarial attack successfully fooled the Faster R-CNN model and it can be utilized in various other use cases.

Potential Threats

As fascinating as it sounds, Artificial Intelligence is highly vulnerable to such attacks! There are serious implications for fields that rely heavily on AI, from self-driving cars to medicine to the military. We gathered a few examples to tell you how these attacks might affect the real world.

Recently, Tesla’s Autopilot was misled by just three stickers put on the road and the car steered in the wrong direction.

It’s definitely troublesome when a benign mole is misclassified as malignant where excision is necessary and you’re charged for the same.

The most worrisome are the implications in the military where the target is autonomously decided. Here, the enemy could just put some adversarial images over say a hospital and might end up attacking the hospital instead of an enemy base. Or there could be drones following the wrong cars.