Adversarial Machine Learning (Part 1)— A gentle introduction

Riya Dholakia
Jan 11 · 3 min read

A brief intro on protecting ML algorithms from adversaries.


This series has been broken down into 3 parts:

  1. Introduction to the field of Adversarial Machine learning
  2. Decision Time Attacks and ways to defend them
  3. Poisoning Attacks and ways to defend them

With a growing use of machine learning models in various applications there is a parallel increase in their risk from adversary. Thus, there is a demanding need for devising several security strategies to protect the machine learning models from harm. This lays the path for study in the science of adversarial machine learning. Several mechanisms have been proposed to understand the nature of such attacks and provide measures to safeguard against them.

So, what is adversarial Machine Learning?

Adversarial machine learning is a research field that lies at the intersection of machine learning and computer security. It aims to enable the safe adoption of machine learning techniques in adversarial settings, such as spam filtering, malware detection, and biometric recognition. The attackers exploit the vulnerabilities of the machine learning algorithms and models to cause harm. This field aims at understanding these attacks and finding ways by which the machine learning models can be made robust to such attacks.


With oncoming of newer and newer technology and new resources there is increasing need for security as well and the trend now is to cause attacks in buzzing field of machine learning. The attackers now are finding newer and newer ways to attack the machine learning models in order to render them useless or for their personal motive of evading an attack.

Not taking care of protecting the Machine Learning algorithms could have disastrous effects in real-time applications.Take the case of self-driving car [1]. Fig.1.1 is what the car is supposed to do when it sees the signal. Fig. 1.2 shows what happens when the adversary adds noise to the image which could prove to be fatal.

Other applications include:

  • In hospitals to predict any anomaly in samples or reports
  • Malware detection
  • Anti-spam software

Classification of Adversarial Attacks

Adversarial attacks can be classified through three different ways:

  1. Attack timing

This refers to when the attack actually takes place and accordingly they are of two types — Decision time attacks and Poisoning attacks.

Decision time attacks take place after the model has been trained

Poisoning attacks take place before the model has been trained.

2. Information

This refers to the level of information the attacker has and are classified into White Box attacks and Black Box attacks.

In White Box attacks, the attacker has complete knowledge about everything like the training data, the algorithm used and the model trained.

In Black Box attacks, the attacker has limited knowledge.

3) Goal

Depending on the goal of the attacker, this can be classified into Targeted attacks and Reliability attacks.

In Targeted attack, the attacker’s goal is to cause a fault on specific instances so that they are misclassified.

A Reliability attack, in contrast, aims to degrade the model by maximising prediction error.

Hope you got a brief introduction onto what is Adversarial Machine Learning, motivation behind it, where its used and how it’s classified.

Stay tuned for Part 2!


[1] Yevgeniy Vorobeychik - Adversarial Machine Learning Book, March 2017.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…