Tackling AI Challenges in Safety-Critical Scenarios — A Review on Robustness of Deep Learning Models and the Release of Perceptron Robustness Benchmark Tools

Published in

Baidu Security X-Lab

7 min readJun 17, 2019

0x01 Current Research of Deep Learning Robustness

Deep neural networks have achieved impressive results in many important visual tasks, such as image classification, with the accuracy better than humans. However, compared to the human visual system, the deep learning model is performing surprisingly at a much weaker level on certain samples with only small perturbations. The existence of these, called “adversarial examples”, poses threats and uncertainties for deep learning applications in many security scenarios such as autonomous driving, face recognition, and malware detection.

As shown in Figure 1, we found that even the adversarial inputs generated under the black-box setting can effectively deceive different APIs of the Google Cloud Vision Platform based on deep learning, including image classification, object detection, image censorship, and optical character recognition(OCR) etc. The cloud-based black-box model can be bypassed with just a few queries. And this research result was presented at Blackhat Asia 2019. The emerging of adversarial examples and their almost 100% bypass rate creates a serious challenge for the use of deep learning models in safety-critical scenarios. The ability of the deep learning model to resist adversarial perturbation and give correct prediction is often referred to as robustness.

Figure 1: Adversarial examples generated under black-box deceive 4 APIs on Google Cloud Vision (Jia et al [1])

We also found that the functional safety of many A.I. systems, such as self-driving car, remains largely dependent on the robustness of the deep learning model, even without adversarial attacks. As shown in Figure 2, an object identified by the target detection model YOLO3 under normal conditions cannot be detected under a slightly different illumination condition, thereby causing potential safety hazards. In the physical world, in addition to adversarial examples, unpredictable environmental changes such as illumination have also become real threats to deep learning models, as some of the research we presented at Blackhat Europe 2018. For more details about the deep learning robustness in safety-critical scenarios, and the usage in real-world scenarios, you can find in our article(in Chinese).

Figure 2: In original picture (left), the object is detected and marked. After increasing the brightness by 13% (right), the object can no longer be detected.

The diversity of AI model security threats makes it necessary to evaluate the robustness of AI models on the production line. The robustness is becoming as an important model criterion as accuracy. However, there is no industry-standardized robustness benchmark yet. Most of the existing adversarial machine learning libraries, such as Cleverhans [2] and IBM Art [3], all focus on the technique of generating adversarial examples, and cannot evaluate the robustness or estimate the upper/lower bounds of the robustness. The current robustness benchmarks are mostly based on theoretical measurements of L_p paradigm perturbations. The Robust Vision Benchmark from the University of Tubingen in Germany provides a public automation platform that aggregates known adversarial attack algorithms and scores most open source pretrained deep learning models as well as user-trained and uploaded models.

Using only the current open source adversarial tools can’t answer many important questions related to the prospects of deep learning models such as:

Is Model A more robust than Model B?
I retrained my model with the adversarial examples, does my model robustness really improve? How much has it improved?
Can my model be guaranteed to be robust? That is, when the size of the perturbations is smaller than a certain value, my model can ensure the stability of the prediction results.

0x02: Perceptron Robustness Benchmark Features

To help model developers answer these questions, Baidu Security X-Labs has released the open source model robustness benchmarking tool: Perceptron Robustness Benchmark (https://github.com/advboxes/perceptron-benchmark). It not only provides standardized measurement for the model’s up to 15 security and safety metrics, but also provides a verifiable robustness lower bound for some models. Compared to other adversarial libraries, the Perceptron Benchmark has the following features:

Multi-platform support: The codes can be used in benchmarking under a variety of popular deep learning frameworks without modification. Supported frameworks include: Tensorflow, Keras, PyTorch, PaddlePaddle.
Multi-tasking support: Currently all adversarial machine learning libraries support only basic image classification tasks, and Perceptron supports a variety of visual tasks including autonomous driving target detection models, and other critical visual models.
Cloud Black Box Model Support: Perceptron also supports robustness measurement of black-box cloud model APIs in MLaaS scenarios. We provide testing interface for multiple commercial AI platforms including Google Cloud Vision, Baidu AIP, and Amazon Rekognition.
Standardized metrics: The robustness metrics given by the Perceptron Robustness Benchmark can be used for multi-tasking, multi-model comparisons, and can be used as an estimate of the model’s robustness upper bound.
Verifiable Robustness: Perceptron Robustness Benchmark uses a formal verification method with symbolic interval analysis, to calculate a reliable robustness lower bound, i.e. when the perturbation is less than the lower bound, the model can guarantee consistent results.

0x03: Use Perceptron Robustness Benchmark

Example 1: Robustness evaluation of pixel level perturbations

We provide both the command line and API interfaces for Perceptron Robustness Benchmark. For example, running the following command line measures the robustness of the Resnet-50 model under the Keras framework to C&W[4] pixel-wise perturbations.

The results include not only the adversarial examples found using the C&W method, but also the upper bounds of the robustness under pixel-wise perturbations (2.10e^-07).

Figure 3: Pixel-wise perturbation robustness results

The output is:

Figure 4: Perceptron shows the minimum perturbation by experiments to make the model predict incorrectly

The output robustness upper bound can be considered as the minimum perturbation by the experiments to make the model predict incorrectly. Perceptron uses Symbolic Interval Analysis (Wang et al) to find the lower bound of robustness: the minimum perturbation from formal verification to make the model predict incorrectly. The relationship between these two values and the real model robustness boundary is shown in Figure 5. Perceptron approximates the robust boundary from both directions by minimizing upper bounds and maximizing lower bounds.

Example 2: Robustness verification of illumination changes

You can measure the robustness of Resnet-18 under the Pytorch framework for different illumination levels by running the following command line. By adding the “-verify” parameter to the command line, Perceptron gives a verifiable robustness boundary.

As shown in the figure, using pixel multiplied by the brightness coefficient to simulate illumination change, Perceptron gives a verifiable range of coefficient (2.064 ~ 0.085), and the model predictions are stable when the brightness stays within this range.

Figure 7: Perceptron shows the incorrectly predicted examples with the minimum illumination change

By publishing the AdvBox Perceptron Robustness Benchmark, we hope to standardize the robustness measurement of the deep learning models, and promote it as a key model evaluation metrics as important as the accuracy.

0x04: Conclusion

More and more researches prove that the ecosystem of AI faces the great challenge of adversarial attacks, which are leading to false model prediction or classification among the increasingly popular visual, sound, and natural language models. We are summarizing the latest researches on the robustness of deep learning models, covering the serious risks faced by various representative scenarios in different adversarial attacks/normal environmental perturbations, existing robustness measurement method and possible improvement. In addition, we also introduce the research results of Baidu Security X-Lab at the forefront of AI security research, including AI model attack and detection in the physical world and cloud black box, model robustness measurement and its practices.

We hope that with this review of the robustness, we can show its importance and challenges, and it can provide a stepping stone for researchers. As security practitioners, we believe that model robustness is as important as accuracy in a safety-critical scenarios, and call on the industry to use robustness as a new metrics in addition to accuracy in evaluating models, while at the same time standardize the robustness measurement methodology.

[1] Enhancing Cross-task Transferability ofAdversarial Examples with Dispersion Reduction. https://arxiv.org/abs/1905.03333

[2] Cleverhans: http://www.cleverhans.io/

[3] IBM Art: https://github.com/IBM/adversarial-robustness-toolbox

[4] Towards Evaluating the Robustness of NeuralNetworks. https://arxiv.org/abs/1608.04644

[5] Efficient Formal Safety Analysis of NeuralNetworks. https://arxiv.org/abs/1809.08098

Tackling AI Challenges in Safety-Critical Scenarios — A Review on Robustness of Deep Learning Models and the Release of Perceptron Robustness Benchmark Tools

0x01 Current Research of Deep Learning Robustness

0x02: Perceptron Robustness Benchmark Features

0x03: Use Perceptron Robustness Benchmark

0x04: Conclusion

Written by Baidu Security X-Lab