Improving GAN training process — GAN of the week

Published in

Cindicator

2 min readAug 29, 2018

GAN of the Week is a series of notes about Generative Models, including GANs and Autoencoders. Every week I’ll review a new model to help you keep up with these rapidly developing types of Neural Networks.

This week the GAN of the week is a Wasserstein GAN

Wasserstein GAN (WGAN) — is an alternative to Vanilla GAN training. It doesn’t have many of the problems that the traditional GAN has, for example, mode collapse.

Why is traditional GAN not great?

As we discussed in Introduction to the GAN of the week, GAN consists of two neural networks — Generator and Discriminator. During training, Generator creates data similar to the initial dataset and Discriminator learns how to distinguish the difference between the real data and fake generated data. In a perfect world, both of them will become more and more accurate over the training process and will be able to achieve a Nash equilibrium.

But in practice, unfortunately, it’s not always the case.

Along with that, there is a mode collapse problem. It’s when Generator collapses and generates the same value on each iteration, and because Generator was not properly trained Discriminator also will not provide good results.

Example of mode collapse from the original paper

Why is WGAN better?

Vanilla GAN uses Jensen–Shannon divergence (JS) between the probability distribution and expected distribution. This metric fails to provide a meaningful value when two distributions are disjoint. In a contract, WGAN uses Wasserstein Distance as a measure of the distance between two probability distributions.

When two probability distributions are fully disjoined, the JS value is not differentiable from the case when two probabilities are fully overlapped, but Wasserstein metric provides a smooth measure that allows stable learning process using gradient descents. Additionally, Wasserstein distance was used as a loss function.