This purpose of this blog is a basic tutorial of Generative Adversarial Networks (GANs) proposed by Ian Goodfellow at OpenAI. The first part gives a brief introduction of GANs background and a practical application called Deep Constitutional Generative Adversarial Nerworks (DCGANs). The second part is an Python implementation of DCGANs on a small letter database: notMNIST. The original code of this tutorial can be found on Github.

Part 1: Background and methodologies
Generative Model is a type of model which can create new data similar with what we feed into it. These model has gain large interest in recent years. In 2014, Ian Goodfellow published a paper introducing the world to a stunning Generative Model: Generative Adversarial Networks, or in short as GANs. The innovation behind this model is that we have two deep learning networks: generative network and discriminate network fighting with each other and after the the two networks reach to a equilibrium point (Nash equilibrium), we can have a mature generative model.

The most famous analogy of the GANs used by many blogs are the game between money counterfeiter and cop.

• Generative network: to be a successful money counterfeiter, the counterfeiter wants to fool the cop, so that the cop can’t tell the difference between counterfeited money and real money
• Discriminate network: the cop wants to detect counterfeited money as good as possible

In the NIPS talk given by Ian Goodfellow, he present a mathematical way to describe it as a minmax game defined as:

J(D) is the loss value of discriminator and J(G) denotes the loss function of discriminator. Since this is a Zero-sum Game, the sum of two function should be zero. Equilibrium is a saddle point of the discriminator loss. Generator minimizes the log-probability of the discriminator being correct.

The data flow of the whole network is shown above. The generator generates fake data from random noise input. The real data is from training dataset and together with fake data we feed them into discriminator. The discriminator generate real/fake labels and also we have correct label information fed into discriminator, we can calculate the loss value of discriminator.

Since we have two deep learning network coupled together, it’s very hard to use back propagation to get the gradients. Deep Convolutional GAN (DCGAN) is one of the models that demonstrated how to build a practical GAN that is able to learn by itself how to synthesize new images.

Part 2: Application on letter generation
In this tutorial we will implement DCGANs based on Keras on a Tensorflow backend. The dataset we use is the notMNIST database. Because of the limiation of computing power, we choose the small set which contains 18,000 images and 10 classes, with letters A-J taken from different fonts. We first use Skimage package to load the image as a numpy matrix and convert the dimension to `(28,28,1)`, which is same with MNIST data dimension.

Let’s print some images to have a look

We can see the letters included range from printed to hand written curl letters. Let’s see if we can build a DCGANs to recreate these images

First, we build our CNN for Generator:

A Generator generates synthesis fake images, which is basically a inverse convolution network. It takes 100 random inputs, and eventually mapping them down to a [28,28,1] pixel to match the notMNIST data shape. Be begin by generating a dense 7x7 set of values, and then run through a handful of convolution layers of varying sizes and numbers of channels and ultimately train using and Adam optimizer for binary cross-entropy . We use a tanh activation on the output layer to help map pixels between -1 and 1, and use LeakyReLU advanced activation in hidden layers to help accelerate training.

Second is our Disriminator:

We build a discriminator network to take in [28,28,1] image vectors and decide if they are real or fake by using several convolutional layers, a dense layer, lots of dropout, and one element sigmoid output layer encoding: [0,>0.5] = fake, and [0.5<,0] = real. This is a relatively simple network, but to make the adversarial network works well we prefer less parameters than generator, which means our generator is a little bit complex than our discriminator.

Now, we put both part together to make our Adversarial Network:

The adversarial network is just the generator-discriminator stacked together. The Generator part is trying to fool the Discriminator and learning from its feedback at the same time.

Training process:

The hardiest part is training process for GANs. Several experiments pre-trained the discriminator to accelerate training process. But for our implementation we found if the discriminator is too stronger than generator, the later one we stop learning. Instead of use larger learning rate for discriminator. For each epoch, we first train discriminator and freeze the parameters of discriminator and train the adversarial network.

Here are some output of immediate results and the results of 1750 epochs:

We can see at first 250 epochs, the generator can only create noise. with decreasing loss value our generator starts learning from the feedback of discriminator and finally we can say some outline of letter “B”, “E” and “F” at 1750 epochs. With limited computing, we cannot go forward to large epoch values( each epoch takes 20 secs). Below here are more sample results generated by our network.

Another observation from the final results is that some image are just white paper with some shadow. We double checked the original dataset and found that some letter are written with white color in dark background. If the generator learn to generate these kind of images, it will create the cases we concerned.

References

Radford, Alec, Luke Metz, and Soumith Chintala. “Unsupervised representation learning with deep convolutional generative adversarial networks.” arXiv preprint arXiv:1511.06434 (2015).

Goodfellow, Ian, et al. “Generative adversarial nets.” Advances in neural information processing systems. 2014.