Does neural network robust with adversarial examples?

An attacker can use adversarial examples to cause self-driving cars to take unwanted actions.

Zhixiong Yue
3 min readSep 5, 2019

With adversarial examples, a typical neural network could be easily fooled and given unwanted classification result with high confidence. This could be a disaster if we use neural networks in real-world applications such as self-driving cars. A team of scientist from the University of California, Berkeley develop three attack algorithms to evaluate the robustness of the image classification neural networks.

Robust physical-world attacks on deep learning models

What are adversarial examples?

A well-trained neural network can be extremely sensitive to inputs with negligible changes. These inputs are referred to as adversarial examples. Adversarial examples are designed to be close to the original samples. The change made on the adversarial example is minor and imperceptible to human. However, this slight perturbation causes degradation on the performance of neural networks.

Every source digit can be classified as target.

The attack algorithms they developed are all targeted attacks. That is, the targeted adversarial examples can trigger the neural network output targeted classification results. These targeted attacks are much more potent than the untargeted attacks since an attacker could manipulate the decision of the classifier.

They conducted adversarial attacks in a white-box manner. That is, the adversary can access the architecture and all parameters in the neural network. Since the attacks can transfer to other black-box access models, it is possible to train a substitute white-box access model and construct the adversarial examples.

To construct the adversarial examples, they use optimization methods with different objective functions based on the loss function while training the neural network. Distance metrics are also included in the optimization so that the constructed adversarial examples look more similar to the original examples. With an appropriate solver, they can quickly find the adversarial examples with the targeted label.

Human eyes could barely tell the difference between original and adversarial.

As a result, their attacks easily break the defensively distilled network, which previously effectively reduce the success probability of adversarial attacks. Moreover, the change between original and constructed adversarial examples is minimal, and even a human could not tell the difference. The existence of adversarial attacks limits the real-world applications of deep learning as a security concern. Therefore, evaluating the neural network robustness is significant before it applies to more and more real-world systems.

Reference:

Carlini, N. and Wagner, D., 2017, May. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP) (pp. 39–57). IEEE.

--

--