Machine learning in the adversarial environment

Published in

GreyWizard

3 min readJan 29, 2018

Machine learning algorithms are more frequently used in applications connected with cybersecurity, in particular in systems for detecting and preventing network attacks.

The functioning of these systems is frequently based on network traffic monitoring and detecting the suspicious activities of users. Machine learning enables detecting such activities without the need for the manual creating of expert rules deciding which activity is an attack. By means of training conducted on regular network traffic, an algorithm can automatically create a predictive model which will be capable of signalling outliers representing prospective attacks.

The use of machine learning algorithms in such a problem assumes the continuous updating of created models. It is required due to a concept drift which is a phenomenon of changes in stream data. Today’s outliers in a week’s time may become a typical example of network traffic. Such an example is the significant increase in the popularity of the Internet service or other change in the specification of its traffic.

The updating of predictive models conducted without due diligence may be used by people attempting to infringe the stability of the system under protection. Such an attacker may, for example, attempt to contaminate training data and lead to mistakes in the operation of a predictive model. A situation described above is an example of a scenario within which the theoretical assumptions of the machine learning algorithm were violated by an opponent attempting to destabilise its functioning.

Adversarial machine learning is a new scientific field at the intersection of machine learning and cybersecurity. It deals with the secure use of machine learning algorithms in adversarial environments (i.e. environments where there is an opponent attempting to interfere with the functioning of an algorithm). The research conducted within this field focus on the following aspects: identification of prospective security hazards, attacks simulation and evaluation of their impact on an attacked algorithm and taking remedial measures in order to improve algorithms security.

Below there is basic information on the types of attacks on machine learning algorithms and the examples of good practices which may help in protecting against such attacks.

The classification of attacks on machine learning algorithms

Attacks on machine learning algorithms are classified with consideration of the following three factors: attack impact, security violation type and attack specificity.

Attack impact

Causative attack — the aim of a causative attack is to introduce changes to atraining collection which will cause the weakening of algorithm functioning.

Exploratory attack — they do not introduce changes to a training collection but test the functioning of a predictive model in order to obtain information on training data or a model.

Type of security violation

Integrity violation — results in a failure to detect an abnormality by a predictive system (increase in the number of false negative predictions)

Availability violation — results in the causing of a high number of prediction system errors, negative false and negative positive, so that the system is no longer usable.

Attack specificity

Targeted attack — influences a specific type of an incident (e.g. an opponent attempts to create a loophole for a specific type of an attack)

General attack — the aim of an attacker is to cause a general system error, e.g. lack of response to all types of attacks).

Attack prevention

While creating a predictive model for an adversarial environment, it is crucial to analyse the possibilities of an attacker to influence data, i.e. defining how an attacker may modify data in order to mislead an algorithm. Such possibilities often depend on the specific domain of an application.

The ability of an attacker to change data may be limited in terms of the number of learning examples which an attacker may influence and an extent to which it is able to modify a single learning example. It is also important to analyse what information an attacker may access, e.g. whether on the basis of a feedback obtained from an algorithm it is able to obtain information on its type or the used space of properties. An attacker may also access knowledge on the data layout within the entire learning collection, in the event of domains where it is public information.

Two most popular methods of defending against causative attacks is the filtration of harmful data from a training collection and using resistant algorithms which can be insensitive to a certain fraction of harmful data in the training collection. In the event of exploratory attacks, one of the used defensive techniques is an attempt of providing false information to an aggressor, for example through introducing a randomness element to the created predictive model near the decisive boundary, what causes the increase in an effort of an attacker in order to recognise it.

Originally published at greywizard.com.