# Improving Back-Propagation by Adding an Adversarial Gradient with Interactive Code [ Manual Back Prop with TF ]

Today I will be covering another great paper from Arild Nøkland, in this paper he uses ‘Fast Gradient Sign Method’ (Work done by Ian GoodFellow) and modified the input data to the network, in turns acts as a regularization.

As usual we are going to train our network using different methods of back propagation, below is the list of all the methods that I am going to use.

Case a) Manual Back Prop with AMSGrad (MNIST dataset)
Case b) Manual Back Prop with AMSGrad (
CIFAR10 dataset)
Case c) Dilated Back Prop with AMSGrad (
CIFAR10 dataset)

Bench Mark for Comparison

Left Image → Paper results for MNIST Data Set
Right Image → Paper results for CIFAR10 Data Set
Red Box → Best results that we will try to out perform

Above two table shows the best results from each of the paper, now for CIFAR 10 Data set we can either preprocess the images with ZCA whiting or standard deviation normalization we are only going to perform the latter.

Left Image → Histogram of Training Images before standardization
Right Image → Histogram of Testing Images before standardization

Left Image → Histogram of Training Images after standardization
Right Image → Histogram of Testing Images after standardization

** NOTE ** Before reading on if anyone wants to know more about Fast Gradient Sign Method please read this blog post, it does an amazing job explaining it.

Above is equation shows the new input to our model, we can observe we are adding something (u) to our original input (x). Now below line explains what that u symbol stands for. But I know this equation might look intimidating but trust me it is not that hard. Lets take a look at the implementation.

Red Box → Original Input X
Blue Box → magnitude of the perturbation
Green Box → Sign of Gradient Respect to the input X

Lets first take a look at what the sgn() function is doing. In Tensorflow we have a similar function called tf.sign(). It just returns the sign of the input value, if the input value is greater than 0 it will return 1, else -1. But just to make sure lets see an example.

Red Box → Input Array [1,-1,1,4] Return Sign [1,-1,1,1]
Blue Box → Input Array [5,-6,-1,8] Return Sign [1,-1,-1,1]

As seen above, the tf.sign() returns an array (or matrix) with same dimensionality but just their signs. Next we are going to multiply with a magnitude of the perturbation. So now every sign value in the array will decrease by factor of magnitude of the perturbation (When we set the value of the magnitude less than 1). Finally we are going to add the value to the original input value X. Since we have been doing manual back propagation for quite a long period of time we already can see the dimensionality works out. (When we get the gradient respect to the input value.) Now lets see how this can be done in Tensor Flow Manual Back Propagation.

Left Image → First Feed Forward Operation and Gradient Calculation
Right Image → Creating new Input (with adversarial gradient) and real back propagation.

And that’s it! Finally, the author of the paper have summarized in step by step fashion on what we need to do.

Network Architecture

Red Rectangle → Input Image (32*32*3) or (32*32*1)
Black Rectangle → Convolution with CELU() with / without mean pooling
Orange Rectangle → Softmax for classification

I am just going to use the same architecture that I implemented in this post. (Continuously Differentiable Exponential Linear Units). In short it is all convolutional network with 7 layers. Except for the first layer, we apply mean pooling layer to reduce the dimensionality.

Case a) Results: Manual Back Prop with AMSGrad (MNIST dataset)

Left Image → Train Accuracy / Cost Over Time
Right Image → Test Accuracy / Cost Over Time

This method worked really well on MNIST data set, for starter the accuracy on test images easily surpassed 99 percent, and as seen below we were able to achieve 99.4 percent accuracy on the test images.

White Line → Best Accuracy for Test Images

With 0.59 error cost, we can observe that the score is competitive enough to be one the ranks of other well know paper methods. Also, we were able to pass the original score presented by the author.

Case b) Results: Manual Back Prop with AMSGrad (CIFAR10 dataset)

Left Image → Train Accuracy / Cost Over Time
Right Image → Test Accuracy / Cost Over Time

For CIFAR 10 Data set we can observe that test image accuracy have stagnated around 72 percent. Indicating the model is suffering from over-fitting. And we can see that we were not able to surpass the bench mark we have preset.

Final accuracy of the model for test images were 72 percent while 96 percent accuracy on training images.

Case c) Results: Dilated Back Prop with AMSGrad (CIFAR10 dataset)

Left Image → Train Accuracy / Cost Over Time
Right Image → Test Accuracy / Cost Over Time

With Dilated back propagation it was a similar story. However there was one interesting fact I found.

As seen above, the accuracy for test images is 71 percent, (however we can observe that it was 72 percent right before the 100th epoch), while the accuracy on training images were 90 percent. Compared to standard back propagation method we have similar accuracy on testing images but we still have more room to grow (improve accuracy) on the training images.

Interactive Code

For Google Colab, you would need a google account to view the codes, also you can’t run read only scripts in Google Colab so make a copy on your play ground. Finally, I will never ask for permission to access your files on Google Drive, just FYI. Happy Coding! Also for transparency I uploaded all of the log during training.

Final Words

Recently I have been covering very interesting papers/blogs, related to regularization. And I really want to know more about this subject.

If any errors are found, please email me at jae.duk.seo@gmail.com, if you wish to see the list of all of my writing please view my website here.

Reference

1. N&#xf8;kland, A. (2015). Improving Back-Propagation by Adding an Adversarial Gradient. Arxiv.org. Retrieved 9 May 2018, from https://arxiv.org/abs/1510.04189