GANs for Diverse and Limited Data — DeLiGAN
Introduction
Generative Adversarial Networks have been in picture for quite a long time since when Ian Goodfellow introduced it in a paper in 2014. Since then GANs have undergone various modifications to handle multitude of tasks ranging from Generating realistic class of images to Text to Image Translation. As rightly said by Yann LeCun:
Generative Adversarial Networks is the most interesting idea in the last 10 years in machine learning
Usually the amount of data available for a particular class is very limited and hence generating realistic images from a small and diverse data using GANs is ineffective. The image categories tend to have complex underlying distributions in recent times. This is mainly due to 2 factors:
- Level of Detail: Like Colour photos have more detail than binary handwritten digits
- Intra-Class Diversity: Like Different Car images all classified in a single class category
Hence GANs need to be Deep to learn such complex distributions which in-turn ask for large amount of training data. Recently researchers at Indian Institute of Science, Bangalore have come up with a new variant of GAN namely DeLiGAN which is well suited for small-yet diverse dataset. In addition DeLiGAN is well suited for generation of diverse images of different modalities-MNIST, CIFAR , Freehand Sketches.
Overview of Vanilla-GAN
A typical GAN consists of two components namely Generator and Discriminator. The Generator is trained in such a way that transforms a random vector(typically noise from normal or uniform distribution) into realistic images. The discriminator’s job is to distinguish between the real dataset images from fake images(generated by the Generator). The Generator job is to fool the discriminator to classify the generated images as real ones. Usually both the Generator and Discriminator are neural networks.

Now diving into bit of maths!
The discriminator output can take values from 0 to 1, where it assigns 0 for fake image and 1 for real image.The generator wants to maximise the score D(G(z)) (where z is the random noise sampled from N(0,1)(normal distribution).Hence from generator point of view, the following optimisation problem needs to be solved while training:

Here p(z) is the distribution from where z is sampled. As we see the minimisation is achieved when D(G(z)) tends to 1. Similarly from the point of view of Discriminator, the optimisation problem becomes:

Here we keep Generator output constant and let only discriminator to train. As we see the maximisation objective is achieved when D(G(z))=0 and D(x) tends to 1 where x is a sample from original dataset. This eventually leads to Cat and Mouse Game where each component tries to better itself off to achieve the objective. In the end the generator starts producing real images such that the discriminator output, D(G(z)) equals 0.5, implying it cant classify the image in correct class category.
So what’s new in DeLiGAN?
In GAN, the Generator tries to model the underlying complex distribution with random noise vector as input. This works well if we have large amount of data available with us. In case of small-yet diverse dataset, the underlying distribution is difficult to be modelled even using a deeper network. The solution to the problem is by increasing the modelling power of the prior distribution, that is, the input to Generator instead of sampling from a single normal distribution , now will sample from N Gaussian Components with mean and variance as u(k) and σ(k). Hence the latent space distribution(from where the samples will be drawn for Generator) would now become:

where we have N Gaussians from where to sample from with equal weight of 1/N for each.
The sample drawn i.e. ‘z’ hence can be represented as

where ϵ is sample from normal distribution.
In normal case the GAN had to maximise the probability that samples belong to data distribution(LHS in below equation):

In DeLiGAN, the new maximisation probability can be rewritten using the last 3 equations as:

Here both u(i) and σ(i) are learn-able parameters.An important issue comes here that the generator output G(z) has local maxima at u(k), so generator minimises σ(i) to obtain more samples from high probability regions.This may well lead to collapse of σ(i) to 0. Hence we add a L2 regularizer in objective function to get:

The regularizer ensures that σ(i) don’t go to 0 otherwise a high penalty is enforced. Here λ is a hyperparameter. The rest is same as earlier model. The following image captures the difference between the vanilla-GAN and DeLiGAN architectures:

Experiments
- MNIST
To mimic the low training set criteria, only 500 of 60,000 MNIST handwritten digit images are taken.The generator network has a fully connected layer followed by 3 deconvolution layers while the discriminator network has 3 convolution layers followed by a mini-batch discrimination layer.

Its seen clearly that the DeLiGAN images are visibly crisper. In addition, as shown in green coloured boxes, some of the generated images from normal GAN are almost identical. On other hand, DeLiGAN generates more diverse samples as seen in right image. Another big difference is that some of the GAN generated images are deformed and don’t resemble any digit. This is because vanilla-GAN diverges frequently during training on account of low data availability resulting in deformed digits. On the other hand DeLiGAN is stable during training and generates samples with more diversity.
Similar results obtained on CIFAR-10 dataset also. In CIFAR-10 GAN outperforms DeLiGAN on some categories like Dogs and Cats. This is mainly due to the fact that images belonging to these categories are mainly similar, these kind of images would be better represented in the data.
2. Toy data
The data distribution is bi-modal and seen clearly in the following image(n):

The comparison is made with GAN++(has a fully connected layer containing N neurons between the input (z) and the generator), Ensemble-GAN(An ensemble-of-N-generators, during training, randomly choose one of the generators G(i) for training), Nx-GAN(increased number of parameters in the generator network N times) and MOE-GAN in addition to vanilla-GAN.
The result is clearly seen that vanilla-GAN cant model the void between the two distributions whereas DeLiGAN can. All other GANs can model the two modes, but they fail to model the local structure in the gaussians properly. In DeLiGAN, the generations produced most convincing, modelling the inter class and intra-class distribution aptly.
3. Freehand Sketches
The TU-Berlin dataset contains 20,000 hand-drawn sketches evenly distributed among 250 object categories. This leads to about 80 images per category, hence actually leading to limited dataset scenario. The inception score is calculated on 5 dissimilar category images namely-Wineglass, Candle, Apple, Canoe, Cup and compared with vanilla-GAN.

Again DeLiGAN clealry outperforms vanilla-GAN in all categories.Similar to CIFAR-10 case, if similar category images are taken GAN might outperform DeLiGAN, as it would model similar images better.
Conclusion
We conclude that DeLiGAN has the following advantages over vanilla GAN:
- DeLiGAN can model complex distributions where we have small and diverse dataset by using Gaussian Mixture model.
- Due to this reparametrization of the latent space, DeLiGAN can efficiently produce more diverse and visibly crisper samples than vanilla GAN.
- DeLiGAN training is more stable than other variants of GAN.
References
[1] GURUMURTHY, Swaminathan; KIRAN SARVADEVABHATLA, Ravi; VENKATESH BABU, R. Deligan: Generative adversarial networks for diverse and limited data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. p. 166–174.