Writing a deep learning repo #2
Picking up from where we left off, it is my second blog post, in the series where I will be going through writing good (okayish quality) Tensorflow code for quite some common deep learning things.
For the first post I have decided to go with Generative Adversarial Networks. I will walk through the entire design process of GANs allowing for intermediate tinkering if someone tries to follow through and reproduce the code.
The first step would be to set up the initial model for the Generative Adversarial network (ie a high level structure) of the graph.
The init file sets up the learning rates, the embedding size which is the initial random set of values over which the generator acts. It also sets up most things like batch size and the values of the intermediate dimensions. Dimension of the channel is seen as the dim_channel. Apart from that we set up some auxiliary functions which help us normalize since normalization allows for faster and higher learning rate.
In the build model theme set up a basic structure where the idea is that we take a random initial value and using that we create an image using the generator. Following that we run the discriminator over the generated image and then again the same discriminator over the real image. All good.
All we need to do is to use set up the losses. We can easily define them as the expectation (mean) of the logarithm of the probability of the image being real given it is real added to the expectation of the image being fake given it is fake for the discriminator.
Following that the loss for the generator is the expectation (mean) of the logarithm of the probability the image is seen as a real image given it is fake (created by the generator) .
All said and done, if we had a simpler thing to mimic (like just a distribution), this would work. So the easy (was it really? ) part is done.
Now we will go on to set up the generator network, which will actually create the images we want to.
The idea is simple, we invert what we do for a classifier and start of from a low dimensional latent (random) sample and use it to retrace a image using de-convolution (transpose of convolution) layers.
Out choice of non linear functions interspersed between the linear layers must be something that only scales or ignores a signal but does not reshape it. Therefore linear like (adaptive or static) non linear additions to our network. The idea would be to use ReLU or rectified linear units to perform this task or use Leaky RELU units since they allow for a small gradient to backprop.
Therefore a 2 dense layer and 3 de-convolution layer network would work images of sizes upto 32 x 32 x channels.
We use the extremely fast and GPU memory units called Tensorflow layers. They parallelize better using caching and minor amount of pipelining. Every layer is normalized in order to avoid exploding weights of image sizes. We use the sigmoid function to create a good mapping for the image. An alternate of tanh could be used too, but the fact the sigmoid has a slope of 1 at 0, allows for us to very easily create images, since below 0 values all turn to black. This allows for good training for image generation. The gradients back prop well since the gradient of 1, allows for change in the right direction for the pixel to be shown and slows down the same change when it comes to training for pixel intensity therefore making it simpler to control when the accuracy has to optimized over.
Moreover, the generator network is well optimized to work with a even larger images. Moreover, along with every layer we concatenate the class information of the image we want to enforce. That allows the generator to create image of the right class.
The next task would obviously be to create the discriminator for this network. The idea is quite simple, we just take the inverse of the generator but add another layer which reduces the final obtained embedding to single bit over which a non linear softmax layer which tries to model this as a probability value embedding, which therefore helps us produce the bit we want to. This is slightly different from the original GAN paper which did not aim to do this, but this works well in practise.
In implementation it is just like the generator network just in reverse order.
We again use the tf.layers module for implementation along size normalization layers to optimize on the training. With this done, we can run this model and train and get good images.
Now for the easy part, training. We train over multiple epochs (around 600). The training is quite simple, we alternatively train the discriminator and generator network. But not quite. We train the discriminator for a higher number of iterations while we do not train the generator for those many. This is because the generator can adapt faster to larger change, if the discriminator is better trained. A badly performing discriminator will never be able to train the generator well, since the training of the generator is a function of how well the discriminator performs too.
Also initially the discriminator network is badly imbalanced, and will be giving random output. Therefore initially we run the discriminator network for a higher number of iterations, just like what was done in some of the later Wasserstein GAN papers (to be discussed in later blog posts).
We also print after every few iterations and save images to see how the results are.
Results








This GAN model learns most of what it can around 50th epoch and only goes into mode collapse later in training even though the training is done for more than about 600 epochs
References
- Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. (2016). Improved Techniques for Training GANs, 1–10. https://doi.org/arXiv:1504.01391
- The code is available at https://github.com/prannayk/thedeeplearning
Next up
In the coming week I will do blog posts on the WGAN and EBGAN variants, and another post on Variational Auto encoder GANs.
