Adversarial Machine Learning: A Closer Look

Adversarial examples were the only thing that drove me towards machine learning, so after studying adversarial examples for last 4 months (not intensively) I decided to write a blog post about the same. Though the field of adversarial ML is quite wide, I’ll be focusing in Neural Networks here(not in a very technical way). First let’s understand what is adversarial machine learning:

“Adversarial Machine Learning is a novel research area that lies at the intersection of machine learning and computer security.”


The first set of papers on adversarial ML PRALab@ University of Cagliari. Their papers are really amazing with topics ranging from poisoning datasets to adversarial clustering and adversarial feature selection(read them, plus they also have a lot of open source tools related to adversarial machine learning).

But the paper which really paved a way for the research around adversarial examples in neural networks was a paper “Intriguing properties of neural networks” by Christian Szegedy. In this paper they perturbations by optimizing the input to maximize the error rate, the paper also discusses the transferability of adversarial examples.

How adversarial attack work?

As said in the first paper the input is optimized to maximize the output, though in case of CNN’s we basically try to decrease the cost by changing our parameters to move in the direction of gradient. Adversarial examples are just opposite of, it they try to change the parameters in such a way that the cost function as well, as error rate is maximized but they do it by changing the input image(I’m describing FGSM here).

Goals of adversarial examples:

  • Confidence reduction
  • Targeted Misclassification
  • Non Targeted Misclassification: Just adding random noise to produce

Types of Attacks:

Just like any kind of computer attack we have two kinds of adversarial attacks:

  • Whitebox Attack: Having the knowledge of the model or network architecture of the target,
  • Blackbox: Without having the knowledge of the target architecture.

Main algorithms to craft adversarial examples:

  • FGSM: Fast gradient sign method
  • JSMA: Jacobian saliency map attack
  • DeepFool

I won’t be describing the algorithms here as there are already a lot of resources to describe them.

What if nobody knows my network architecture?

Like described in the first paper adversarial examples are transferable, means that one adversarial example crafted on one CNN architecture will most likely cause a misclassification on another architecture.(More information in resources section)

That’s okay, but why people are suddenly interested in adversarial ML:

Because we only have Neural nets outperforming the humans now, especially in computer vision. So making a neural net classify incorrectly is not a design fault anymore and can be really harmful for some applications.

But why should I care, these scenarios are just lab based?

We’ll as a developer, data scientist or ML researcher you should really care about adversarial examples as they have real life physical threats to ML, DL even RL systems. For example, there have been a lot of papers on Physical adversarial examples to fool self driving cars and probably there will be more in future related to robust examples in NLP and RL too. Plus there are a lot of Malware detectors and IDS(Intrusion detection systems) based on neural networks, so having these applications vulnerable to the adversarial examples totally defeats the purpose of these applications.

Robust Physical-World Attacks on Deep Learning Models:

Tools to study and research more:


  1. My talk at PyconUK:
  2. Notes on adversarial examples: I’m planning to start writing summary of research papers on adversarial examples in this repo so you can follow for more info on adversarial examples:
  3. Ian Goodfellow talk on adversarial examples(heavily influencing this post):
  4. Explaining and Harnessing Adversarial Examples:
  5. Transferable adversarial examples:
  6. Black box attacks:
  9. People to follow if you want to know more about the research related to adversarial examples: Ian goodfellow, Nicholas Papernot, Nicholas Carlini