Machine Learning Security Risks

Rohit Gupta
Rohit's Perspectives
4 min readMay 29, 2019

Rise of AI

Over the past few years, artificial intelligence has become an integral part of our daily lives. From its use in recommendation engines in beloved services like Netflix to its use as a spam filter in our email clients, AI technology has been uniquely transformative and widely adopted. On an enterprise level, this seems to hold true as well — we’ve previously looked at how AI is transforming cybersecurity, but in general the outputs of these AI systems are strongly impacting business decision making. Unfortunately, adversarial parties have taken notice of these trends as well, and are becoming strongly incentivized to compromise the availability or integrity of such systems. In fact, cybersecurity researchers and industry experts have identified three type of attacks that can compromise unsupervised machine learning algorithms and systems: adversarial attacks, data poisoning, and model stealing.

Source: Adversarial Attacks Against Medical Deep Learning SystemsSource: Adversarial Attacks Against Medical Deep Learning Systems

Evasion Attacks (Adversarial Inputs)

An evasion attack involves adversaries constantly probing classifiers with new inputs in an attempt to evade detection, which is why they are sometimes called adversarial inputs since they are designed to bypass classifiers. An example of an evasion attack includes designing malicious documents to evade spam filters, which occurred a few years ago when an adversary realized that Gmail only displays the last attachment if the same multi-part attachment appears multiple times in an email. The adversary weaponized this by adding an invisible first multipart attachment that contained many reputable domains to evade detection. To get an idea of what this looks like, consider the demonstration below — starting with the image of a panda, an attacker adds a small perturbation that has been calculated to make the object recognition algorithm identify the image as a gibbon with high confidence.

Data Poisoning Attacks

Distinct from evasion attacks, data poisoning attacks involve adversaries feeding polluted training data to a classifier, blurring the boundary between what is classified as good and bad in the adversaries’ favor. The most common type of data poisoning attack is model skewing, which results in the classifier categorizing bad inputs as good ones. This happens very frequently — some of the most advanced spammer groups try to throw the Gmail filter off-track by reporting massive amounts of spam emails as not spam. As shown in the figure below, between the end of Nov 2017 and early 2018, there were at least four malicious large-scale attempts to skew the classifier.

While this may seem innocuous, data poisoning attacks are particularly worrisome because they have the potential to turn the cybersecurity world upside down. For instance, Cylance, a cybersecurity startup, utilizes artificial intelligence to classify whether a sample contains malware without ever having seen it before; various organizations including financial service institutions and healthcare providers rely on Cylance’s technology to neutralize malware that could expose sensitive consumer or patient information. If adversaries are able to evade Cylance’s AI-powered malware classifier, hundreds of enterprises could be at risk. This problem isn’t specific to Cylance — any cybersecurity startup that uses AI to flag anomalous behavior or recognize zero-day exploits is vulnerable, and so are their customers.

Model Stealing Techniques

Of the three types of AI attacks, model stealing is the most concerning but also the least likely. Specifically, model stealing techniques are used to recover models or information about data used during training. These attacks are a major concern as AI models represent valuable intellectual property trained on potentially sensitive data including financial trades, medical records, or user transactions.

With respect to model reconstruction, the idea is that adversaries can recreate AI models by utilizing the public API and refining their own model using it as a guide. With regard to membership leakage, a recent paper demonstrates successful attacks that can estimate whether a respondent in a lifestyle survey admitted to cheating on their significant other, and how to recover recognizable images of people’s faces given their name and access to the model, as shown below.

Possible Solutions

Today, there’s no comprehensive solution available on the market, and many of the mitigations proposed are specific to certain model architectures and do not generalize well. For example, one way to mitigate against attackers who try to tamper with input data is to only provide predictions for authenticated and certified data, which could be accomplished by creating a tag that certifies originality of the data in question.

However, two companies that are willing to bet on the future of adversarial AI attacks are Calypso and Neurocat. Calypso, a stealth-mode startup funded by Lightspeed, provides various model testing and certification tools that provide insight into model performance and security, although significant customization is required to tailor the tools to each company’s model and data processes. Neurocat, on the other hand, is a startup based in Berlin that helps companies identify vulnerabilities and patch them.

While security for machine learning models is a nascent field, it’s certainly a space to keep an eye on.

--

--