Understanding Adversarial Machine Learning

Ravi Chamarthy

Published in

IBM Data Science in Practice

5 min readOct 28, 2019

“Tell me and I forget. Teach me and I remember. Involve me and I learn.”
Benjamin Franklin

In the journey to AI, there are four important steps (rungs) as put forward by The AI Ladder from IBM:

Collect: Make data simple and accessible.
Collect data of every type regardless of where it lives, enabling flexibility in the face of ever-changing data sources.
Organize: Create a business-ready analytics foundation.
Organize all the data into a trusted, business-ready foundation with built-in governance, protection, and compliance.
Analyze: Build and scale AI with trust and explainability.
Analyze the data in smarter ways and benefit from AI models that empower to gain new insights.
Infuse: Operationalize AI throughout the business.
Operationalize AI throughout the business — across multiple departments and within various processes — drawing on predictions, automation, and optimization.

This post focuses specifically on the third step of the AI Ladder, Analyze where we create a machine learning model. Uses of AI in daily life are now numerous: for example, should we approve the loan or not, is the mail spam or not, predict the wait time for the traffic signal based on the traffic flow, house price prediction, fraudulent or non-fraudulent, among many others.

Let’s take a step back and ask what is a machine learning model and its lifecycle is?

A machine learning model can be a mathematical representation of a real world problem, where it involves training the model with a train data set and testing the model with a test data set and if the observed predictions are fine, with a very less deviation from the expected predictions then deploy the model for runtime usage.

Adversarial Attacks

Like any other application even a machine learning model is also vulnerable to, what is known as, adversarial attacks. The ML model can be fooled by slightly modifying the original to trick the model into “believing” that this modified sample belongs to an incorrect class with high confidence.

For example, (reference from Ian Goodfellow’s paper at https://arxiv.org/pdf/1412.6572.pdf) a panda image which when added and processed with some adversarial noise and supplied to ML model, it would incorrectly predict as a gibbon.

Another example would be spam filtering — where a mail from “itsme@example.com” with the subject “Get cheap loans now!!!” would correctly be classified as a spam mail. But if the mail subject contains some good words (as distinguished from the bad words like “cheap” “loans”) or for that matter containing the recipient name “John Langford” then the mail spammer algorithm can inversely classify the mail as ham mail.

How Adversarial ML works

To simply put, Adversarial ML is a way to misguide the machine learning model with a malicious input so that the model makes incorrect predictions. Some (but not all) areas in which Adversarial ML are applicable include fraud detection, spam detection, intrusion detection, and malware detection.

Typically building machine learning model comprising of following steps:

Gather data and build the training data along with ground truth labels
Build the model, which includes
* Identify an algorithm to use for the model
* Find out the features for the model
* Train the model
And finally we have the trained classifier, sorry the model.

Attacks on machine learning model can happen on any of the above steps, which are data poisoning attacks and evasion attacks.

In data poisoning attacks, the adversary carefully adds crafted samples into the training data which involve feeding and training adversarial data to the classifier. For example, in model skewing, where the attacker attempts to pollute training data in such a way that the boundary between what the classifier categorizes as good data, and what the classifier categorizes as bad, shifts the prediction. Say, for a group of data points the original model prediction is considered to be fraudulent, but with training of adversarial example the model prediction changes from fraudulent to non-fraudulent. This results in disrupting the learning process and manipulating the final trained model.

In evasion attacks, the attack happens on the trained model itself. Such an attack occurs when a data point originally and correctly classified into class A is manipulated and fed back into the model, now receiving an output that is not class A. For example, an email is originally and correctly classified as spam, is modified by adding some good words to the email, and again send back to the model. But this time the model incorrectly classifies and marks it as a ham email, and not a spam.

Based on who is the adversary and the kind of the knowledge the adversary possess on the whole ML system the attacks can be further termed or categorised as Black Box attacks and White Box attacks. In a Black Box attacks the adversary does not know the internals of the ML system, like doesn’t know on what is the algorithm used, feature set, training data etc. But knows what is the final classification output of the model — say a email would be classified as spam or not. And in case of White Box attack, the adversary knows all about the model and its internals.

Defenses Against Adversarial ML

Some defenses against adversarial machine learning include, “injecting adversarial examples” to the machine learning model and “defensive distillation”.

In case of adversarial examples, the model is trained with pre-defined adversarial examples which are labelled as unfavorable outcomes. Defensive distillation is an adversarial training technique where one model is trained to predict the output of another model that was trained on an earlier. Meaning, or for example, let’s say the first model is to predict a biometric scan matches the fingerprint on record, and it is trained hard to attain maximum accuracy. But the issue here is, finger printing matching does not happen on every pixel, as it is time consuming. But if we get 95% (say, a threshold) matching then the person is granted access. In such a scenario if the adversary learns that about such a scenario then can send a fake fingerprint image with just a handful of the right pixels that meet the ML system, which generates a false positive match. So the solution is, build another distilled model and train it on these uncertain samples to act as an additional filter.

IBM Adversarial Robustness 360 Toolbox

Adversarial Robustness 360 Toolbox (ART) from IBM is a Python library supporting developers and researchers in defending Machine Learning models (Deep Neural Networks, Gradient Boosted Decision Trees, Support Vector Machines, Random Forests, Logistic Regression, Gaussian Processes, Decision Trees, scikit-learn Pipelines, etc.) against adversarial threats and helps making AI systems more secure and trustworthy. I will write more on ART in upcoming posts.

Conclusion

This post is to give an introduction on adversarial attacks on machine learning models, and the advances and innovations that IBM is doing in this area with IBM Adversarial Robustness 360 Toolbox.

Thank You!

References:

Explaining and Harnessing Adversarial Example https://arxiv.org/pdf/1412.6572.pdf

Adversarial Robustness Toolbox
https://github.com/IBM/adversarial-robustness-toolbox

IBM AI Ladder
https://www.ibm.com/blogs/think/2018/02/ibm-ai-ladder/