Generative Adversarial Networks: What They Are and What They Can Create

Justin Tennenbaum
4 min readSep 24, 2019

--

Most of the statistics and machine learning algorithms I have encountered in my life always worked towards predicting one thing: Y. They all take in a set of features, X, and perform various techniques and operations to predict outcomes. These algorithms are called Discriminative Algorithms and their structure is to predict a value or label; their general goal can be symbolized by the function P(Y|X). In regards to Classification, where we are trying to predict a class given X, discriminative algorithms can be thought of as mapping features to labels and their goal is to model the barriers between classes so they can classify correctly.

Generative models are the opposite of discriminative, as they instead try to model the relationship/distribution between X and Y for each distinct class, Y. Given a class Y, they can create the features X that represent Y and instead of trying to model P(Y|X) their goal is to model P(X|Y). This opens up an entire new area of data science and allows machines the potential to create and expand rather than just simplify down to a simple prediction. First introduced in 2014 by Ian Goodfellow, Generative Adversarial Networks (GANs) are a unique type of model that uses two Neural Networks that compete with one another to craft lifelike creations. The first of these neural networks is a generative network called “the generator” that generates new inputs such as images or music. While the other neural network, called “the discrimator”, examines these images and determines whether or not they are real, or created by the generator.

Credit: O’Reilly

The simplest way to think of this model’s dynamic is in terms of a thief (generator) and an inspector (discriminator). Suppose a thief steals a painting from a museum and replaces it with a duplicate. The thief is only successful if that duplicate is good enough to pass for the original. The inspector’s job is solely to review that painting and determine if it is the original or a copy. Similarly, these two networks compete against each other, hence why it is called an adversarial network. They play a zero sum game, where the thief wins when the inspector is fooled and the inspector wins if he can detect the fake.

In order to make sure that one network does not become dominant, we alternate freezing each model’s weights and only improve one of the two models at a time. This also helps the current model to better analyze the opposing models gradient for quicker improvement. As we alternate, each model improves until the generator is able to produce almost lifelike images, videos, music, and more.

Despite General Adversarial Networks being the most used for realistic image recognition, they are not perfect. These networks are very complicated and can be extremely tough to train since it is difficult for the networks to find a Nash Equilibrium. These calculations are in very high dimensions, and are very computationally expensive to compute which can cause very long training times. On a GPU it can take hours and on a CPU it can take over a day for the network to train. Other problems include:

· Mode collapse: the generator produces only a limited varieties of samples

· Diminished gradient: the generator’s gradient becomes almost non-existent if the discriminator becomes too dominant

· Non-Convergence: sometimes the loss functions will diverge and never reach a global-minima

· Can be very sensitive to hyper parameters

· Generated images can lack diversity of the data’s true distribution

Google’s DeepMind believe to have found a better generative model that tackles most of these problems. They have been exploring Variational AutoEncoders, which use an Encoder to reduce data to a lower dimension and a decoder that returns the data to its original form. In 2017 DeepMind published a paper on a Vector Quantised Variational AutoEncoder (VQ-VAE), which instead reduces the images to discrete latent variables instead of continuous, and then uses a decoder to reproduce these images from those variables.

This VQ-VAE breaks down a 256x256 image into two separate parts, a 64x64 vector that represents local information such as texture and a 32X32 vector that represents global information including shape and geometry. The decoder then takes these vectors and recreates an image from these two layers. Depending on the size of the image, the algorithm may use more than 2 layers if needed to record the necessary information to reconstruct the image.

Credit: SyncedReview
VQ-VAE samples (left) and BigGAN deep samples (right) trained on ImageNet. Credit: SyncedReview

Generative Networks are an amazing subset of Data Science that is only beginning to be explored. These networks continue to improve and new uses for them are constantly being discovered. There is even belief in the Data Science community that General Adversarial Networks might be the key to creating efficient Auto Machine Learning Algorithms.

WORKS CITED:

--

--

Justin Tennenbaum

Data Scientist at Flatiron passionate about Math and Technology