Adversarial Machines

Fooling A.Is (and turn everyone into a Manga)

6 min readDec 7, 2015

Adversarial A.Is are a common sci-fi theme: Robot VS Robot. In recent years, real adversarial examples have emerged. This experiment explores how to generate images to fool A.Is (and turn everyone into manga).

Convolutional Neural Networks

At the heart of many modern computer vision systems are Convolutional Neural Networks. On some vision tasks, CNNs have surpassed human performance. Industries such as Web-Services, Research, Transport, Medical, Manufacturing, Defence and Intelligence rely on them every day.

Convolutional Nets are commonly used to classify images. The network is shown an image of a pipe and classifies it as “pipe”. Generalist networks are able to classify 1000+ classes of objects with amazing precision and speed.

Fooling Neural Networks

A series of published research papers has produced evidence that Convolutional Neural Networks can be fooled. Images can be manipulated, so that image recognition networks are likely to miss-classify them. These manipulations look like noise, almost invisible to humans.

Image by Christian Szegedy (Google) et al. NOTE: “Noise” is used for imagination. “Imperceptible changes” more fitting.

This problem has stirred controversy in the Machine Learning community, with some hailing it as a “deep flaw” of deep neural networks and others promoting a more cautious interpretation. Researchers are actively exploring the reasons for adversarial examples. Ian Goodfellow gives a great overview in his recent talk: ‘Do Statistical Models Understand the World? (Video)’

Research Papers: Deep Neural Networks are Easily Fooled (code / video): Exploring the space of Adversarial Images : Adversarial manipulation of deep representations : Explaining and harnessing adversarial examples : Intriguing properties of neural networks : Breaking Linear Classifiers on ImageNet : Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks : The Limitations of Deep Learning in Adversarial Settings

Experiment: Generating Adversarial Images

This experiment started with an exploration of the recently published paper Exploring the space of adversarial images by Pedro Tabacof & Eduardo Valle of University of Campinas in Brazil. The paper investigates adversarial examples and hints that most current CNN classifiers are vulnerable.

Adversarial Noise examples. (Image by Pedro Tabacof, Eduardo Valle)

Alongside the paper, they released open-source code that enables anyone to generate adversarial images easily.

The experiments aim was to find a way to demo this library. All explored scenarios were rejected, as outcomes are highly uncertain. Here is a sample of rejected ideas:

Selection of rejected ideas. WARNING: COMEDY INTENDED

Experiment: Generating Adversarial Mangas

Recently a new pre-trained CNN model was released: Illustration2Vec: A Semantic Vector Representation of Illustrations. Masaki Saito(Tohoku University) and Yusuke Matsui (University of Tokyo) trained a model on large amounts of Manga Images.

Selection of training data (*presumably copyrighted content*) / Tag Prediction

The Illustration2Vec model has the ability to predict copyright tags. One could say, It has memories of copyrighted content. A fascinating way to explore convolutional networks is deepdream. This experiment dreams with the Illustration2Vec model and turns everyone into a Manga.

Questions raised: Are the generated images copyright infringing? Can the copyright detection bots of large manga sites (or disney) be fooled easily?

A selection of generated manga

Final Thoughts

Adversarial Examples are a fascinating area of ongoing research. They highlight limitations of current systems and raise a number of interesting questions. While industries are racing to include visual intelligence systems in mission-critical infrastructure, looking at edge-cases and exploring solutions is a productive path. As the surrealist Belgian painter René Magritte said in “The Treachery of Images”: “Ceci n’est pas une pipe”.

Interview

The following is a interview with Nicolas Papernot, a machine learning researcher who recently published two papers on adversarial examples:

Q: What is your core research interest in the “adversarial examples” space?

I do research in a lab focused on security. Our end-goal is to identify vulnerabilities in deep neural networks to better understand their attack surface and defend them. Our first paper explores attacks while the second one defenses. Our algorithms were designed to reduce the number of input features that we perturb so that they can be applied on various datasets (spam, authentication, etc.).

Q: In your recent publication you introduce a “a defensive mechanism to reduce the effectiveness of adversarial samples on DNNs”. Could you explain your approach in simple terms?

Our paper proposes to provide additional information at training on samples. This Information takes the form of probabilities and gives us insight on the various classes. To extract probabilities we do a 1st training of the network. Then we do a 2nd training with the same architecture, including these probabilities. This gives more robustness and smoothness.

The consequence are: 1. Derivatives very small (amplitude of jacobian components). 2. average minimum perturbation (number of input dimensions) to leave source class and achieve target class increases by 500–800% in tests.

Q: Do you think we will “solve” adversarial problems in the near future or are the problems deeper?

It is a tough problem because it is closely linked to how we train our networks. Distillation as a defense is a good first step. New defenses will probably involve additional tricks at training.

Q: Part of your research was “sponsored by the Army Research Laboratory”. How serious do you think the implications/risks of adversarial examples are for society at the current stage?

“Note that my opinions are mine and not the opinion of the Army Research Laboratory (but I acknowledge their generous support). The implications of adversarial examples are very serious for any company in the industry. If someone has potentially an incentive to benefit from the misbehaviors (e.g. classification) then there is a risk. The consequences can be bad: cars, spam, authentication, malware, network intrusion, fraud detection come to mind.”

Get in touch here: twitter.com/samim |http://samim.io
Sign up for the Newsletter for more experiments like this!