Intro to Generative Adversarial Networks
In general, generative networks are unsupervised learning techniques that seek to learn the distribution of some data (e.g. words in a corpus or pixels in images of cats).
Briefly, GANs consist of two networks with opposing objectives, seeking equilibrium in a sort of game being played between them.
- The “Generator” transforms some input that is sampled from what is called the “latent space” into the “output space” which contains what we desire to generate.
- The “Discriminator” is simply a classifier that receives both, real images and outputs from the Generator, and is trained to determine whether the input it is observing is synthetically generated, or real.
The idea is that when both networks are performing optimally, the Generator creates images that are distributed within their respective output space in the same way that real inputs to the Discriminator are.
Let’s now define an optimization function for the GAN. The Generator tries to maximize its probability of classifying an image correctly as real or fake. The Discriminator, on the other hand, tries to minimize the chances of the generator of classifying an image correctly as real or fake.
Mathematically, we define our objective function as
D(x) represents the probability that the input image is real. Hence, the discriminator will have to maximize D(x) and log(D(x)). And hence, Term 1 has to be maximized.
The Generator has to maximize the chances of the discriminator getting fooled by the generated images. Which means, the generator should want to maximize D(G(z)). Which means, it should minimize (1-D(G(z)) and hence minimize log(1 — D(G(z)).
Some popular adversarial network architectures are:
a. Conditional GANs, that learn the distribution of output images given paired inputs for applications such as image-to-image translation.
b. Deep Convolutional GANs, that are also used to generate realistic images
c. Self-Attention GANs, that helps the model to focus on a certain part in the picture and also solves the problem of large are detection.
Read my next blog for a detailed explanation about these architectures and how I trained these models on the CIFAR-10 dataset.