Generative Adversarial Networks: Which Neural Network Comes On Top?
In a world filled with technology and artificial intelligence, it is becoming increasingly harder to distinguish between what is real and what is fake. Look at these two pictures below. Can you tell which one is a real-life photograph and which one is created by artificial intelligence?
The crazy thing is that both of these images are actually fake, created by NVIDIA’s new hyperrealistic face generator, which uses an algorithmic architecture called a generative adversarial network (GANs).
Researching more into GANs and their applications in today’s society, I found that they can be used everywhere, from text to image generation to even predicting the next frame in a video! This article provides a brief overview into the inner workings of GANs and how they are currently being used in the workspace today. I know you are probably as excited as I am to learn more about this, so let’s get started!
The Basic Architecture of GANs
GANs fall under the broader category of generative models, using an unsupervised learning approach to recognize new patterns from the training data. These models have the corresponding input variables but lack the output variables to perform classical prediction models. Some examples of these generative models include the Naive Bayes network, the Latent Dirichlet algorithm, and the aforementioned GANs. Let me know if you want an article on these algorithms (Spoiler alert: They’re just as cool as GANs!).
GANs, like other generative models, are able to generate new examples that are similar and in some cases indistinguishable from the training set that was provided. So how do these GANs actually form these new examples out of thin air?
As you may have guessed from the title, GANs use two neural networks that compete with one another to create or generate variations in the data. Other machine learning models can be implemented, but often, neural networks are the most optimum solution. These two sub-models are often called the generator model and discriminator model.
The Generator and Discriminator Model
The generator and discriminator model have opposing goals, basing their performance on different sets of metrics.
The generator model has one sole purpose: to produce fake data from a set of random inputs. The generator model aims to produce the most realistic fake images from random noise that is given to it. Its main purpose is to maximize the discriminator model’s loss or the classification error.
The discriminator model, on the other hand, decides whether the data that is given to it is from the real sample or the fake sample produced by the generator network through binary classification. The discriminative network is trained to take the true data and generated data and classify them accordingly. Therefore, in a way, you can conclude that the discriminator’s main purpose is to (you guessed it!) decrease the classification error.
To recap, the generator model aims to maximize the classification error by producing extremely realistic images so that the discriminator is unable to tell which image is fake and which image is real. On the other hand, the discriminator model aims to classify correctly whether an image is fake or not, trying to minimize the classification error produced by the model.
Now, you can see how the generator and discriminator are sort of fighting against one another. Both are seeking to best the other: the generator in generating the best fake image that the discriminator can’t deduce and the discriminator in finding the difference between the generator’s image and the discriminator’s image. One tries to minimize the classification error while the other one seeks to maximize it.
Now that you have gotten the basics down, let’s go through the workflow that a GAN takes in order to produce those hyperrealistic images. First, random noise is generated, and random variables are inputted that serve as the fake images. There are two main training phases that go into one iteration of a GAN workflow.
First, the discriminator is trained with the generator model frozen. In this phase, the discriminator is trained with real data, inputted by the user, and fake data, generated by the random noise generated by the generator model mentioned above. The discriminator’s main goal is to be able to distinguish whether these pictures given to it are real or fake.
Next, the generator is trained with the discriminator frozen. The generator receives the results from the discriminator model and uses them to make more realistic images to try to fool the discriminator better. The generator model aims to sample new data from the random noise generated to make a realistic image at the ending.
A key thing to know is that images are simply probability distributions over an N dimensional vector space; to say that something looks like an image is actually just conveying that something has a very specific probability distribution. The generative network in a GAN takes point from the random distribution as input and turns them into points to achieve the resulting target distribution that can fool the discriminator model.
The cycle continues as the discriminator model uses the generator’s new updated results to make better classifications on the model. This back and forth process eventually leads to indistinguishable images from the generator that the discriminator can not deduce.
At the end, remember that the discriminator model’s accuracy is not the one that matters. Our main goal is to maximize the generator model’s efficiency at the ending since we want to produce images that are indistinguishable from real life ones.
The Challenges of GANs
GANs face multiple problems with their current implementation; however, they are quickly being solved by new and more advanced GANs by large tech companies like NVIDIA. These challenges often make other generative models used other than GANs. Here are the two main challenges regarding GANs that companies are currently facing.
A central problem is of the stability between the generator and the discriminator; this is important as this stability can ruin the sole purpose of the GAN model to create new images. If the discriminator model is too powerful, then it will simply classify all images as fake; however, if it is too lenient, the GAN will never improve, leading to a useless network. Often, finding this level of stability is difficult as it’s hard to predict what the generator and discriminator will do when training and updating their gradients since there is no human intervention.
Another problem with GANs is that they are unable to determine the positioning of certain objects and understand the perspective of the images. For example, it will generate one dog with six eyes rather than two eyes because it doesn’t understand how many times an object (like an eye) needs to occur at a particular location. GANs are unable to understand a wholistic or global perspective, which is why other generative models are often more commonly used.
The Applications Of GANs
Although GANs face a wide number of challenges, they have a vast array of applications and future possibilities in the technology field.
GANs can do anything regarding image generation from producing images from natural language descriptions to surveillance and security to determine footage that may get distorted in the rain. GANs are also filled a variety of niche use cases like being able to morph audio from one speaker to another, enhancing the resolution of an image, and doing image to image translation.
In my opinion, the most important application of GANs lies in its ability to create data to train classifiers with limited amounts of data. Data generation is often one of the most difficult components of training any type of model in machine learning, and GANs can fix that solution by creating images from thin air that relate closely to the training data. GANs will be instrumental in improving classifier accuracy and giving data to large models. Comment below what you think the coolest application of GANs are!
Since you know the main architecture behind GANs, I suggest that you try coding one up! A great place to start is by generating images using the MNIST dataset through GANs. You can find the steps to complete it here or check out my Github repository containing the full detailed implementation below.
Permalink GitHub is home to over 50 million developers working together to host and review code, manage projects, and…
- GANs take an unsupervised learning approach by placing two neural networks against each other with opposing purposes(called the generator and discriminator model).
- The generator model’s main purpose is to maximize the classification error produced by the discriminator model by making extremely realistic “fake” images to present to the discriminator.
- The discriminator model’s main purpose is to minimize the classification error between which images are real and which images are fake.
- The gradients between the generator and the discriminator model are constantly updated based on the performance of the opposing model.
- GANs often face two challenges: balancing the stability between the generator and discriminator model and determining the position of certain objects to develop a wholistic perspective of the image.
- GANs have a vast array of applications from creating images to train image classifiers with limited amounts of data to text to image detection.
Hi! I am a 16 year old currently interested in the fields of machine learning and biotechnology. If you are interested in seeing more of my content and what I publish, consider subscribing to my newsletter! Check out my newsletter here! Also, check out my LinkedIn and Github pages. If you’re interested in talking about autonomous vehicles or just technology in general, sign up for a chat using my Calendly.