Sitemap
The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +772K followers.

Gradient-based Adversarial Attacks : An Introduction

10 min readApr 8, 2020

--

Neural networks have lately been providing the state-of-the-art performance on most machine learning problems, even to the extent of producing superhuman performance on several tasks. However, though such superior performance might seem very alluring, it has been observed that these neural networks are not very robust in terms of dealing with slightly different distributions. Consider a neural network that has been trained on an input distribution X with the corresponding label set L. Given a input example x with a corresponding label l, it can be shown that an adversarial example x’ can be obtained from x by adding a very small perturbation to the original input such that x’ is classified differently as compared to x. The fact that such a small perturbation can cause the existing neural networks to falter prevents these models from being used in security-critical areas. In this blog, we will be summarising the basic gradient-based attack mechanisms that have been developed for performing white-box (model parameters and weights are available to the attacker) adversarial attacks on neural networks.

Press enter or click to view image in full size
Example of misclassification due to Adversarial Attacks

Gradient based Adversarial Attacks

Gradient based adversarial attacks exploit a very simple idea originating from the concepts involved in back-propagation(an algorithm used to train deep neural networks). We will go through back-propagation in a very informal manner in this blog since the main focus here is on summarising gradient based techniques for developing adversarial examples.

Press enter or click to view image in full size
Gradient flow during back-propagation

For training deep neural networks, the back-propagation algorithm is widely used. The first step here involves calculating the error(using a specific error function) between the desired output and the network output corresponding to a particular input. Now keeping the input constant, the calculated error is used to compute the gradients corresponding to each parameter(also known as weight) of the network. These gradients are used update the weights at each step taking into consideration a specific learning rate.

Gradient based attacks use this concept to develop a perturbation vector for the input image by making a slight modification to the back-propagation algorithm. Contrary to common practice, while back-propagating through the network, it considers the model parameters(or weights) to be constant and the input to be a variable. Hence, gradients corresponding to each element of the input (for example, pixels in case of images) can be obtained. These gradients have been utilised in various manners to obtain the perturbation vector, such that the new adversarial example has a greater tendency towards being misclassified, provided that it satisfies the constraint of being very similar to the input. Since the input gradients are used to obtain these perturbation vectors, these are known as gradient based attacks.

Some of these gradient based adversarial attack techniques have been explained below.

A prerequisite for understanding the mathematics behind these methods is a basic knowledge of calculus and the concept of norms of vectors and matrices.

Fast Sign Gradient Method (FGSM)

In their paper, the authors argue that :

“ .. neural networks are too linear to resist linear adversarial perturbation. LSTMs (Hochreiter & Schmidhuber, 1997), ReLUs (Jarrett et al., 2009; Glorot et al., 2011), and maxout networks (Goodfellow et al., 2013c) are all intentionally designed to behave in very linear ways, so that they are easier to optimize. More nonlinear models such as sigmoid networks are carefully tuned to spend most of their time in the non-saturating, more linear regime for the same reason. This linear behaviour suggests that cheap, analytical perturbations of a linear model should also damage neural networks.”

Therefore, the authors focus on producing an adversarial example x’ = x + η such that x’ is classified incorrectly by the neural network. In order to make x’ and x produce different outputs, η should be bigger than the precision of the features. For example, if each pixel of an image is represented by 8 bits, any information below 1/255 of the dynamic range is discarded. In order to achieve this, the authors use the following scheme for computing η :

Here, J is the cost function used to train the neural network, θ represents the parameter of a model, x is the input to the model and y is the target associated with x (for machine learning tasks that have targets). Here, ε decides the size of the perturbation and sign of every element of the perturbation vector(or matrix/tensor) is decided by the sign of the input gradient at the element. This solution is motivated by linearizing the cost function and solving for the perturbation that maximizes the cost subject to an L∞ constraint. This linear perturbation technique method reliably causes a wide variety of models to misclassify their input. This method does not require an iterative procedure to compute adversarial examples, and thus is much faster than other considered methods.

Basic Iterative Method (BIM)

In this paper, the authors suggest a very simple improvement to FGSM. They suggest applying the same step as FGSM multiple times with a small step size and clip the pixel values of intermediate results after each step to ensure that they are in an ε-neighbourhood of the original image. This method is also referred to as Iterative-FGSM or IFGSM. Mathematically, the attacking scheme can be demonstrated as :

For the experiments, the value of α used is 1, i.e., the pixel values are changed only by 1 at each step. . The number of iterations were chosen heuristically; it is sufficient for the adversarial example to reach the edge of the ε max-norm ball but restricted enough to keep the computational cost of experiments manageable.

Both the methods described till now have focused on simply trying to increase the cost of the correct class, without specifying which of the incorrect classes the model should select. A technique for this has been shown in the paper. Since this blog focuses on a basic overview of gradient based adversarial attacks, we are skipping explicit details from the paper. Kindly refer to the paper for knowledge about the same.

Boosting FGSM with Momentum

The momentum method is a technique for accelerating gradient descent algorithms by accumulating a velocity vector in the gradient direction of the loss function across iterations. The memorisation of previous gradients helps to barrel through narrow valleys, small humps and poor local minima or maxima. The momentum method also shows its effectiveness in stochastic gradient descent to stabilize the updates. We apply the idea of momentum to generate adversarial examples and obtain tremendous benefits.

FGSM generates an adversarial example by applying the sign of the gradient to a real example only once by the assumption of linearity of the decision boundary around the data point. However in practice, the linear assumption may not hold when the distortion is large, which makes the adversarial example generated by FGSM underfit the model, limiting its attack ability. As an improvement, iterative FGSM greedily moves the adversarial example in the direction of the sign of the gradient in each iteration. However, in this case, the adversarial example can easily drop into poor local maxima and overfit the model, which is not likely to transfer across models. The property of producing attacks that can be transferred to other models whose parameters are not accessible to the attacker is known as the transferability of an attack.

Thus, in this paper, the authors integrate momentum into the iterative FGSM for the purpose of stabilizing update directions and escaping from the poor local maxima. By using momentum, the gradients are updated at each step by accumulating the velocity vector in the gradient direction as

After gradient calculation, the adversarial sample at the (t+1)-th step is computed as

The momentum-based method helps retain the transferability of adversarial examples when increasing iterations (by preventing the adversarial example from dropping into a local maxima), and at the same time acts as a strong adversary for the white-box models like iterative FGSM. It alleviates the trade-off between the attack ability and the transferability, demonstrating strong black-box attacks.

Projected Gradient Descent (PGD)

The Projected Gradient Descent (PGD) attack is essentially the same as BIM (or IFGSM) attack. The only difference is that PGD initializes the example to a random point in the ball of interest (decided by the L∞ norm) and does random restarts, while BIM initializes to the original point.

In general, Projected Gradient Descent is a well-known method in the optimization literature [paper]. In that sense, it is better to use “PGD” to refer to this (quite general) method instead of the narrow “IFGSM” terminology. Strictly speaking, the version of PGD that we are talking about is the non-euclidean, L∞-PGD that uses the L∞ norm as a distance function.

Carlini Wagner Attack with L2 Norm

In this approach, the authors propose to generate adversarial samples by considering the following optimization problem

where x is fixed and the goal is to find δ that minimizes D(x, x+δ). Thus, we want to find a small change δ such that when added to an image x, the image is misclassified (to a targeted class t) by the model but the image is still a valid image. Here, D is some distance metric (L0, L2 or L∞ according to the paper) and C is the model being used.

One problem here is that it very difficult for the existing algorithms to solve the constraint C(x+δ) = t due to its highly non-linear nature. Thus, the authors express the problem in a different form suitable for optimization. They define an objective function f such that C(x+δ) = t if and only if f(x + δ) ≤ 0. The paper proposes a list of possible choices of f

where s is the correct classification, (e)+ is short-hand for max(e, 0), softplus(x) = log(1 + exp(x)), and lossF,s(x) is the cross entropy loss for x. Some of the above formula have been adjusted by adding a constant. However, this has been done only so that the function respects the original problem definition and it does not affect the final result. It simply scales the minimization function.

Now, instead of formulating the problem

this alternate formulation is used :

where c > 0 is a suitably chosen constant. Considering the distance metric D to be the p-th norm of a set of vectors, the optimization problem is reduced to finding δ that solves

Now, in order to satisfy the constraint in the optimization problem, a change-of-variables is applied and thus, we optimize over w, setting

Using this change-of-variable, since −1 ≤ tanh(w) ≤ 1, the constraint 0 ≤ x+δ ≤ 1 is satisfied. Hence, this becomes a simple minimization problem now and according to the experiments conducted by the authors, the most effective optimization algorithm for solving this problem is the Adam optimizer.

For more details about searching for the optimal value of c and convergence proofs, kindly refer to the paper.

Adversarial Training

Standard supervised training does not mandate that the chosen model will be resistant to adversarial examples. This property must be encoded into the training procedure somehow and that is where adversarial training comes in.

It can be shown that by training on a mixture of adversarial and clean examples, a neural network could be regularized somewhat. In general, data augmentation uses techniques such as translation, rotation, etc in order to produce input samples similar to what might be expected in the test set. However, training on adversarial samples is a little different in the sense that it uses inputs that are unlikely to occur naturally but that expose flaws in the ways that the model conceptualises its decision function. Hence, a wide range of literature in the domain of adversarial machine learning uses adversarial training to make models robust against such adversarial examples. And since these adversarial attacks have been observed to be transferable, adversarial training using samples generated from a single model provides robustness to other models performing the same task as well.

Conclusion

This blog has briefly touched upon some of the gradient based techniques for generating adversarial samples. Other techniques that one must definitely familiarise themselves with is DeepFool and the research paper “Learn to Pay Attention”. I hope this blog serves as a worthy introduction to the field of adversarial attacks.

Thank you for your time in reading this blog. Kindly contact me if you have any queries about the topic.

--

--

The Startup
The Startup

Published in The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +772K followers.

Siddhant Haldar
Siddhant Haldar

Written by Siddhant Haldar

Budding engineer | AI enthusiast

Responses (2)