GAN — CGAN & InfoGAN (using labels to improve GAN)

Jonathan Hui
Jun 3, 2018 · 5 min read

In discriminative models, like classification, we handcraft features to make models performing better. This practice is not needed if the model has enough capacity and it knows how to learn those features itself. In GAN, training models are non-trivial. It can take extra help from the labels to make the model performing better.

In CGAN (Conditional GAN), labels act as an extension to the latent space z to generate and discriminate images better. The top figure below is the regular GAN and the bottom adds labels to the generator and the discriminator to train both networks better.

Image for post
Image for post

The whole mechanism is still not fully understood. These labels may give a significant head start to GAN for what to look for. Another possibility is that our visual system is biased and more sensitive to these labels. Hence, the generated images are perceived to be better. Being part of a GAN series, this article studies how to improve the GAN performance with labels.

CGAN (Conditional GAN)

In GAN, there is no control over modes of the data to be generated. The conditional GAN changes that by adding the label y as an additional parameter to the generator and hopes that the corresponding images are generated. We also add the labels to the discriminator input to distinguish real images better.

Image for post
Image for post

In MNIST, we sample the label y from a uniform distribution to generate a number from 0 to 9. We encode this value into a 1-hot vector. For example, the value 3 will be encoded as (0, 0, 0, 1, 0, 0, 0, 0, 0, 0). We feed the vector and the noise z to the generator to create an image that resembles “3”. For the discriminator, we add the supposed label as a one-hot vector to its input.

The cost function for CGAN is the same as GAN.

Image for post
Image for post

D(x|y) and G(z|y) demonstrates we are discriminating and generating an image given a label y. (It is the same as D(x, y) and G(z, y) in other diagrams.) Here is the data flow for CGAN.

Image for post
Image for post

In CGAN, we can expand the mechanism to include other labels that the training dataset may provide. For example, if the stroke size of the digits is known, we can sample it from a normal distribution and add that in generating images.

Image for post
Image for post

InfoGAN

The label y in CGAN are provided in the dataset. Alternatively, we can use our discriminator to extract all these latent features. In the example below, we sample a single feature c from a uniform distribution and convert it to a 1-hot vector. Then the generator uses this vector and z to generate an image.

Image for post
Image for post

When we feed images into the discriminator, it outputs D(x) and an additional output: a probability distribution Q(c|x) (the probability distribution for c given the image x.) For example, given a generated image resemble the digit “3”, Q may be estimated as (0.1, 0, 0, 0.8, …) meaning 0.1 chance that the image is a digit “0” and 0.8 chance that it is a “3”.

Image for post
Image for post

We subtract the regular GAN cost function with an extra term I(x; y) to form our new cost function.

Image for post
Image for post

I (mutual information) measures how much we know x if we know y. I(c;x) equals to 0 if the image x and the estimated c is completely irrelevant. Otherwise, if the discriminator can correctly predict c, I will be high and reduce the InfoGAN cost.

Without proof, the mutual information I can be estimated using entropy. We use Q(c|x) and P(C) to establish a lower bound for I.

Image for post
Image for post

where H stands for entropy. When the model performs, I will converge to its lower bound. The concept of mutual information may take time to settle in. However, it is pretty simple in coding.

# If the image is a "3"
# p_c = P(c) = [0, 0, 0, 1.0, 0, 0, 0, 0, 0, 0]
# The first term for I = - ∑ P(c) * log Q(c|x)
cross_H_p_q = tf.reduce_mean(
-tf.reduce_sum(p_c * tf.log(Q_c_given_x + 1e-8), 1))
# The entropy of c: H(c) = - P(c) * log(P(c))
H_c = tf.reduce_mean(-tf.reduce_sum(p_c * tf.log(p_c + 1e-8), 1))

Further readings

If you want to learn more about GANs:

Or you want to know why it is so hard to train GANs:

Credits & Reference

Conditional GAN paper

InfoGAN paper

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store