Reading list for the NIPS 2018 Adversarial Vision Challenge

Wieland Brendel
3 min readJul 25, 2018

--

The Adversarial Vision Challenge at NIPS 2018 has recently started and will run until November 1st. A key ingredient of the challenge is the ability of attacks to query the decision of the model on self-defined inputs. This setting mimics real-world threat scenarios (e.g. NSFW filters) and also prevents nuisance defences like gradient-masking.

If you plan to enter the competition you should know about transfer-based attacks/defences, decision-based attacks and some defence strategies. The following list of references is a good start to dive into these topics:

[1312.6199] Intriguing properties of neural networks
This is the first paper that studied adversarial examples within the realm of deep neural networks and showed that they transfer between models. It was also the first paper to propose adversarial training.

[1802.00420] Obfuscated Gradients Give a False Sense of Security: Circumenventing Defenses to Adversarial Examples
This paper nicely shows that basically all proposed defences (except one) do not really make the models more robust but rather “confuse” attacks by masking their gradients or confidence scores. This is a very important point: just because some adversarial attacks fail does not mean that a model is actually robust.

[1611.02770] Delving into Transferable Adversarial Examples and Black-box Attacks
Improves the success and perturbation size of transfer-based attacks by transferring adversarials that fool a whole ensemble of models.

[1712.04248] Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models
This paper introduces the first and so far only decision-based attack that is able to craft minimal adversarial examples on ImageNet just by querying the final decision of the model. It also introduces the taxonomy used here (gradient-based, score-based, transfer-based, decision-based attacks).

[1602.02697] Practical Black-Box Attacks against Machine Learning
A decision-based attack that is based on a reverse engineering of the source model. It is limited to data sets with small intra-class variability like MNIST or street sign datasets.

[1706.06083] Towards Deep Learning Models Resistant to Adversarial Attacks
This paper is currently considered the only effective defence that could resist many months of scrutiny. It is based on a variant of adversarial training (adversarials are generated iteratively and with randomly perturbed starting points) and is mostly limited to MNIST.

[1803.06373] Adversarial Logit Pairing
Another promising variant of adversarial training based on the logits of clean and adversarial examples. One of our baselines is trained using this method.

[1805.09190] Robust Perception through Analysis by Synthesis
This promising defence uses a generative model for discrimination and is the first for which adversarial examples on MNIST start to make sense for humans. So far the method is limited to MNIST or other data sets with small intra-class variability.

[1707.04131] Foolbox: A Python toolbox to benchmark the robustness of machine learning models
The interface of the challenge is largely inspired and based on Foolbox. In contrast to other libraries like CleverHans, Foolbox is framework agnostic and implements model wrappers for different DL frameworks like Tensorflow, Keras, MXNet, Pytorch and others. Furthermore, Foolbox implements a large range of different adversarial attacks that are each tuned to minimize the adversarial perturbation.

[1610.00768] Technical Report on the CleverHans v2.1.0 Adversarial Examples Library
Another popular library based on TensorFlow that implements many different adversarial attack algorithms to evaluate the robustness of ML algorithms.

--

--

Wieland Brendel

Machine Learning Researcher at the University of Tübingen & Co-Founder of layer7.ai