Neural Networks Easily Fooled

Dries Cronje
Deep Learning Cafe
Published in
4 min readFeb 16, 2018

You are on your way to work in a self-driving vehicle, laptop out, typing away, one eye on the road.

The next moment the self-driving vehicle speeds up to 100km/h, hurtling through an intersection.

Pure anguish.

What happened?

The vehicle was fooled into thinking a STOP sign was a 100km/h sign and sped up.

How is that possible?

Fooling a computer vision system into thinking a cat is guacamole

Previously researchers have shown that it was possible to fool a computer vision system into thinking that the image of a cat was an image of guacamole.

Computer Vision system fooled into thinking it’s a photo of guacamole. (Photo credit MIT CSAIL)

Researchers managed to manipulate the image in such a way as to make the computer vision system think it was a guacamole.

The task of fooling a neural network entails finding the minimum amount of input necessary to flip the Neural Network’s classification to what you want it to be.

This kind of attack is known as a Black-box attack.

For image classification, this means changing pixel values in the smallest possible way to make the classifier believe it is something else. These changes are usually done without visibly changing the image — with just enough manipulation to fool the computer vision system.

Do not think you are safe because your network architecture is not publicly available. Researchers have shown that it is possible to make a good estimate of the architecture of a network by making changes to a substitute network against samples until it behaves in a similar way to the “secret” network.

The substitute network is used to generate adversarial images capable of fooling the “secret” network.

To pull off a Black-Box attack takes a huge amount of effort and processing power and would be hard to pull off in real-world scenarios — thus nobody paid much attention.

However, along comes a new kind of attack using an adversarial patch in the form of a sticker that you can, with little effort print out and place next to the object you want the computer vision system to be confused by.

Adversarial Patch

A paper (Adversarial Patch) recently published by Google has lifted the bar. The paper shows how it is possible to show the system any image and it will classify it as a toaster.

Whereas with the Black-box attack, the attacker needs to mess with the pixel values of the actual image for each image it wants the network to confuse, the adversarial attack works by placing the patch in the image frame.

The sticker is designed in such a way to fool the underlying neural network responsible for classification, into thinking that any image you show the computer vision system is that of a toaster, no matter what image you show it — you just need to place the sticker next to the image.

The bad news is that it works really well.

How do we defend ourselves?

The attacks described above take advantage of the inherent nature of neural networks to overfit to the dataset it trained on with the ability to generalize to small variations making it extremely good for computer vision systems.

It is impossible to conceive all the techniques the ‘bad guys’ will come up with to take advantage of the vulnerabilities of neural networks. The only way to counter these attacks is to build robust systems resilient enough to counter all known onslaughts, coupled with a set of checks and balances to quickly respond to new attacks.

One such method that has shown potential, is by generating adversarial images and adding them to your training data, in the hope it will make your system more resilient.

Another option is to use input transformations. The paper Countering Adversarial Images using Input Transformations investigates the use of input transformations as a defence. Experiments done on ImageNet demonstrates that these techniques are an effective defence against both Gray-Box and Black-Box attacks.

Ultimately your biggest defence will be to stay one step ahead.

UPDATE:

6/5/2019

Objects invisible to YOLOv2

Hide from YOLO v2

Whereas previously researchers have found a way to fool neural networks into thinking some objects are different objects by using a patch, researchers have now created an adversarial patch that makes an object invisible to YOLO v2.

It is possible to print the patch making it practical for real-world attacks.

Thanks for reading. Don’t forget to hit that ‘clap’ button 👏👏

Let me know what you think of the threat deep learning poses in the hands of the bad guys in the comments below. You can also follow me on Twitter or LinkedIn for more content.

--

--