Security threats against deep-learning-based systems

In the last decade, the effectiveness of AI technology has been validated in several business applications. Starting with self-driving cars, machine vision, intrusion detection systems, identity authentication, etc. The learning capability of these algorithms is just fascinating, and there is no doubt that AI is becoming progressively an industry standard and a must-have for every business.

— In the public discourse, AI is being perceived as “something data, something neural network, and something black unbeatable magic”. We tend to ignore security issues of intelligence-based systems because it’s just too hard to believe that a high performant detection system can be fooled with some simplistic tricks”.

Overview of security attacks against ML-based systems:

Before I dig deeper into different scenarios of attacks and compromising techniques against intelligent systems. I would love to outline some basic concepts to make it easier to discuss security issues related to ML-based systems.

It’s obvious that every attacker has a specific strategy in mind of (1)how to damage an intelligent system, then (2)which phase of the learning pipeline will be attacked, and (3)what type of damage is supposed to be done.

Conventionally, an attack against ML-based systems is structured as an adversarial model and described with four dimensions: goal, knowledge, capability, and attacking strategy.

The adversarial goal refers to the expected damage of the attack. For instance, an attacking goal against intrusion systems (e.g malware detection) would be to cause a high false-positive rate (false alarms) which will make the decision model very disturbed and untrustworthy.

The adversarial knowledge describes how much the attacker knows effectively about the data (training/testing), the features, the learning algorithms, and the parameters as well.

— Note that there is a difference between how much the attacker knows about the whole setup and his controlling capability.

The capability dimension is crucial and consists of three main practical aspects:

  • The percentage of training/testing data controlled by the attacker.
  • The extent of learning settings known by the attacker such as the features and the hyper-parameters.
  • The type of the potential damages of the security threat : (1) is it inductive? which means the attacker can alter the data distribution, induce a change in the learning settings during the next retraining round, and decrease the classifier performance. (2) is it exploratory? where the attacker is only trying to uncover and gather sensitive information about the training data and the learning model.

At last, the attacking strategy is the plan to put all the three elements mentioned above into action.

Based on what the attacker knows and what he can control and the kind of damage he intends to do, the attack strategy specifies the set of decisions and operations to be executed in order to achieve the final goal. For example, the attacker can define the functions/behaviors to change the labels categories or/and modify the features in the training data.

Hopefully, you got now the big picture about potential vulnerabilities of ML-based systems. Let’s focus on the threats on one of the main phases of the learning pipeline: the training/retraining phase.

During the training phase

— Initially, the training phase can be protected with high confidentiality during the first training round.

Roughly speaking, it’s hard, at this point, for attackers to harm the performance of the ML models either by altering the training data or changing the training parameters. Because it’s simply not being deployed yet.

Instead, we should be very concerned about the systems that require periodic retraining rounds of existing ML models. These are more vulnerable to adversarial attacks.

For instance, systems designed for adaptive facial recognition, malware detection, or spam detection, it’s essential to perform seasonal retraining rounds because the data distribution can change over time. In this case, some attacks like data poisoning can work very well and it’s increasingly used for practical usage.

Poisoning attacks:

This part is very cool. You’ll see how AI can be used to beat another AI.

Do you see the picture above? this is the data poisoning attack.

First, a poisoning attack consists of injecting adversarial samples as perturbations in the training data in order to mislead the learning algorithm. The adversarial samples are generated by another DNN model. They are close just enough to the normal distribution to hide but also they are malicious enough to fool. As a result, the decision rules of the system will be distorted. The deep-fool algorithm is a great example of an adversarial AI used to fool another one.

DeepFool algorithm

Is There Any Way to Fool the Computer?

Deep learning algorithms have shown impressive performance to classify images and detect anomalies. However, with small adversarial perturbations, their performance collapses drastically.

In 2016, Alhussein Fawzi and his colleagues from École Polytechnique fédérale de Lausanne have proposed an algorithm, called DeepFool, that can fool deep learning classifier with a simple trick that consists of adding minor pixel perturbations.

What does that mean?

For some like an adaptive facial recognition system, the attacker can exploit the periodic updates and inject malicious samples into the training data used for retraining the decision model. The impact of the poisoning attack is that it induces moving the distribution of the training data for normal behavior. Once it’s done, the attacker will be able to pass the identity authentication barriers.

So what?

It’s true that deep learning algorithms seem to be harder to entrap with a random injection of abnormal samples. A typical architecture of deep learning is effective to discriminate different patterns in the training data. However, according to Deep-Fool, this doesn’t last forever.

Deep learning relies on a minimum set of features to classify images. The deep-fool algorithm could get to generate minimally perturbed images where the DNN classifier tends to misclassify.

See the example below.

Source: Deep-Fool algorithm

There are other ML-adversarial models similar to Deep-Fool like fast gradient sign method (FGSM) and Jacobian-based saliency map approach, etc. I don’t know enough about these algorithms to have an opinion about them. Please, refer to the references I have attached for more details.


You don’t need to work in the cyber-security domain to be interested in security issues of ML-based systems. Your awareness about the potential vulnerabilities of these systems and their robustness threats will spare you a lot of troubles not only during the deployment of your model but also from the training/ retraining phase.

Keep in touch:

In case we haven’t met before, I’m Soufiane CHAMI.
Thank you for reading. I hope I could help you ponder today. :)

Data scientist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store