Hacking Deep Learning Models: you don’t even need a PhD nor a GPU

Published in

Axionable

7 min readFeb 1, 2019

Introduction

Deep neural networks (DNNs) are outperforming learning models that have several successful use cases, notably in many safety-critical environments such as self-driving cars, cancer diagnosis, surveillance and access control, voice-activated digital assistants, credit scoring, detection of fraudulent financial transaction and malicious binaries.

However, scientific researches from institutions (Google, Facebook, Samsung Research America), universities ( New York, Montreal, Kyushu, Imperial College London, DIEE, , Ann Arbor, Stony Brook, Carnegie Mellon, Tübingen, Zhejiang, Alibaba-Zhejiang, UIUC, Lehigh, among others) and centers (Integrative Neuroscience, Computational Neuroscience, International Max Planck Research School for Intelligent Systems and Institute for Theoretical Physics, etc.) show that these models are easily fooled. We distinguish different types of attacks that can successfully fool DNNs (see section Characterization and classification of adversarial algorithms attacks for more details):

Adversarial algorithms attacks

Adversarial algorithms attacks are the first and the most studied type. They are mostly applied on the computer vision domain. The next section is dedicated to learn more about them.

DNNs could be fooled by a one-pixel perturbation!

On the top of the figure, we find the correct labels of the input images. Applying a one-pixel perturbation attack [Su et al., 2017] causes DNNs to misclassify the inputs into the targeted classes on the bottom.

**Fig. 2 [Su et al., 2017**] One-pixel attacks that fooled Networks (AllConv, NiN and VGG).

Audio adversarial examples

We give the following example of [Carlini & Wagner, 2018] where the authors construct targeted audio adversarial examples on automatic speech recognition and apply it to Mozilla’s DeepSpeec. Given the audio waveform in the top left of the image bellow, which transcribes as “it was the best of times, it was the worst of times”. Then, applying a small perturbation allow to produce another that is over 99.9% similar, but transcribes as the chosen phrase “it is a truth universally acknowledged that a single” with a 100% success rate.

**Fig. 3** [Carlini & Wagner, 2018] Illustration of Carlini & Wagner targeted attack on automatic speech recognition

Training data poisoning attacks

Data collection learning algorithms are exposed to the threat of data poisoning, i.e., a coordinate attack in which a fraction of the training data is controlled by the attacker and manipulated to subvert the learning process. As application examples, we find spam filtering, malware detection and handwritten digit recognition [Muñoz-González et al., 2017].

Physical-world attacks

Adversarial examples have limited effectiveness in the physical world due to changing physical conditions. We give some examples of physical successful attacks:

Physical printed attack

Consider a driverless car system that uses DNNs to identify traffic signs. The image on the left shows a real graffiti on a STOP sign and the image on the right shows crafted ones that fooled the DNN to classify the STOP sign as a speed limit 45 sign and then the car would not stop, thus subverting the car’s safety.

**Fig. 4 [Evtimov et al., 2017**] Targeted physical perturbation experiment results on LISA-CNN using a poster-printed Stop sign (subtle attacks) and a real Stop sign (camouflage art attacks). In blue, the targeted-attacks success rate.

Accessorize to a crime

It’s about a physically realizable and inconspicuous attack applied on facial biometric systems which are widely used in surveillance and access control. It is realized through printing a pair of eyeglass frames allowing the attacker wearing it to evade being recognized or to impersonate another individual as shown in the example bellow:

**Fig. 5 [YouTube**, **Sharif et al., 2016**] Example of 100% successful impersonation attack.

IPhone X Face ID fooled by 200$ Twin Mask

The Vietnamese based security experts from Bkav have successfully beat IPhone X Face ID with their 3D Twin Mask that costs less than 200$ and can unlock the phone the same way twins do.

**Fig. 6 [Source**, **YouTube**] Twin Mask

Consequences: Why is it inevitable to study this phenomenon?

Clearly, those attacks could make DNNs operating incorrectly and could create incentives for adversaries to fool network models, which can seriously undermine the security of the systems supported by those models, sometimes with devastating consequences. For example, autonomous vehicles can be crashed, illicit or illegal content can bypass content filters, or biometric authentication systems can be manipulated to allow improper access.

Consequently, users as automobile manufacturers, bank and insurance companies, radiology services, users of facial and voice ID, etc., need to trust in the correct operation and robustness of DNNs against eventual attacks. For this reason, massive scientific papers were (and still are) devoted to craft adversarial attacks, develop defensive mechanisms to reduce their effectiveness on DNNs and define utility metrics to measure the vulnerability of DNNs.

Adversarial Algorithms Attacks on DNNs

What, when and how?

This phenomenon, so-called “adversarial instability”, was first introduced and studied in 2013 in the image classification domain [Szegedy et al., 2013]. The authors found that applying a very small human-imperceptible perturbation to the inputs in the test phase enable to create “adversarial examples” that can fool networks and result in arbitrary incorrect outputs (or specific ones as shown later).

**Fig. 7 [Szegedy et al., 2013**] Adversarial example generated for AlexNet: The bus is predicted to be an ostrich!

Why is it possible?

These manipulations are actually possible owing to the imperfect generalization learned by DNNs from finite training sets [Bengio, 2009] and the underlying linearity of most components used to build DNNs [Kurakin et al., 2016].

How to get started with protecting your DNN?

Attacks/Defenses Platforms

Several attack/defense platforms have been proposed to provide reference implementations of the attacks and defenses with most popular deep learning frameworks in purpose to be used by developers to construct robust models.

In the table below Tab. 1, we list five open source platforms. For each one, we give the number of attacks and defenses implemented, the DL frameworks supported and useful links to the codes and documentations. The lists of attacks and defenses in each platform are given bellow in Tab. 2 and Tab. 3.

Attacks

In the Table below, we list the different attacks implemented in each one of the platforms DeepSec, ART, AdvBox, Foolbox and Cleverhans. We remark that there are attacks implemented in multiple platforms. The star “*” mark in some attacks in the lists of ART, AdvBox, Foolbox and Cleverhans corresponds to the attacks that are not covered in DeepSec.

Defenses

In the Table below, we list the different defenses/ detectors implemented in each one of the platforms DeepSec, ART and Cleverhans.

Characterization and classification of adversarial algorithms attacks

Adversarial algorithms are characterized by targeted architecture, attack strategy, perturbing average, misclassification success average and computational costs.

In general, existing attacks can be classified along multiple different dimensions [Yuan et al., 2017, Ling et al., 2019]. In the following table, we give a classification of adversarial attacks according to [Kurakin et al., 2018].

**Tab. 4** Classification of Adversarial attacks

Evaluate the robustness of DNNs

In general, the evaluation of the robustness of DNNs could be done by introducing robustness metrics or by constructing attacks and introducing defensive procedures.

The former approach is substantially more difficult to implement in practice, and all attempts have required approximations [Bastani et al., 2016], [Huang et al., 2017]. On the other hand, defensive procedures should be done with low impact on the architecture, maintain accuracy and speed of network, be robust against transferability. We present in the table below the existing defending methods and their Key drawbacks according to Kurakin et al., 2018, the authors say that no method of defending against adversarial examples is yet completely satisfactory.

**Tab. 5** Existing defense methods and their Key drawbacks.

Conclusion and call for collaboration

The vulnerability of deep learning models were first introduced and studied in 2013. Alongside the exponential growth of DL models that are nowadays outperforming humans in many areas, this vulnerability phenomenon becomes more and more critic and result in legal and ethical debates when it comes to the safety-critical domains such as self-driving cars or cancer diagnosis.

There exists a vast literature by scientists and developers that tries to restrain all possible adversary attacks and develop defensive mechanisms to reduce their effectiveness. Many open source platforms are available for this reason. However, this is not enough and need more efforts to ensure the robustness of our AI world. Thus, Axionable is committed to innovating on attack and defensive techniques. If you have an idea or want to collaborate, please email us at datascience@axionable.com.

The tables are available on PDF format with accessible hyperlinks. If you are interested, do not hesitate to contact us.