Step by Step Implementation of Conditional Generative Adversarial Networks
Generative Adversarial Networks (GANs) have had a lot of success since they were introduced in 2014 by Ian Goodfellow. For somebody starting out in Machine Learning, the intricate Mathematics and the complex-looking architecture of GANs seems daunting. So, let’s demystify GANs/C-GANs and implement a simple application with PyTorch. This article is self-contained and is targeted for beginner to intermediate level Machine Learning enthusiasts.
Outline:
- A brief introduction to GANs
- Implementation of GAN in PyTorch
- From GAN to Conditional GAN
- Implementation of C-GAN
- Follow up exercises
Resources:
- Here is the link to my GitHub repo for the code of this tutorial.
A brief introduction to GANs
Let’s start our journey by briefly describing GANs. GANs are primarily used to generate content i.e GANs are generative models. GAN architecture consists of two components: a Generator and a Discriminator.
The Generator is trained to generate samples similar to the training set. It takes a random noise as input, passes this input through its network, and generates an output of dimensions same as that of samples in the training set.
The objective of the Discriminator is to distinguish the generated samples (generated by the Generator model) from the real ones.
As training progresses, both the generator and discriminator become adept at their respective tasks i.e Generator learns to generate output that is close to the real sample, and Discriminator learns to discriminate it from the real ones. Both the Discriminator and Generator try to outdo each other. Furthermore, improvement in the ability of the Discriminator propels the Generator to generate samples that are similar to the training set in order to confuse the Discriminator.
Resources for a detailed review of GANs:
Understanding Generative Adversarial Networks (GANs)
A Brief Introduction To GANs
Now, let’s look at a few interesting applications of GANs. An important thing to note is that “All this is generated by a neural network”.
Generation of human face images
Painting Generation
Generation of cartoon pictures
Generating colored photographs from sketches
Photos to emoticons
Generating a child’s face using parents pictures
For more interesting applications you can read the following articles:
1. https://medium.com/@jonathan_hui/gan-some-cool-applications-of-gans-4c9ecca35900
2. https://machinelearningmastery.com/impressive-applications-of-generative-adversarial-networks/
Implementation of GAN in PyTorch
Let’s jump to the implementation part. I will use the MNIST dataset for this example. MNIST is a dataset consisting of 28 X 28 size images of handwritten digits. Our GAN model will be trained using this dataset and will eventually be able to generate similar digit images.
Here is the link to my GitHub repo for the code of this tutorial.
A typical machine learning setup consists of the following steps:
1. Define the Model
2. Define the Loss function
3. Define the optimizer
4. Train the model
— Forward pass
— Compute Loss
— Call the optimizer and update the weightsI would recommend using Google Colab with GPU runtime for faster execution.
Firstly, we will import some modules
import torch
from torchvision import transforms, datasets
The second import statement will be used for loading the MNIST dataset and the transformations that will be applied to the dataset.
import torch.nn as nn
from torch import optim as optim
The torch.nn module would be used to create our model and optim module for defining the optimizer. An optimizer is used to update the parameters of the model.
You can follow along even if you don’t understand any of the above jargon. I’ll briefly talk about these terms as we use them in the code.
Let’s select the device for computation. It’s important to use a GPU for faster computations.
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
If you have changed the runtime type to GPU then the device variable would be set to “cuda”. You can verify that by printing this variable.
Loading the dataset — PyTorch provides a simple way of loading popular datasets like MNIST.
training_parameters = {
"n_epochs": 100,
"batch_size": 100,
}data_loader = torch.utils.data.DataLoader(
datasets.MNIST('./', train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(
(0.5,), (0.5,))
])),
batch_size=training_parameters["batch_size"], shuffle=True)
You can provide the transformations that you want to apply on the dataset — We will go for two transformations which are very common while playing with datasets: Converting to Tensors and Normalization.
Batch_size also needs to be passed as a parameter to the dataloader — We will use a batch_size of 100. The batch size depends on your GPU capacity. Colab can handle a batch size of 100. If you encounter any issues related to GPU memory then reduce the batch_size as per your GPU capacity.
Let’s look at a few images of this dataset. I have used matplotlib library to display the images. The below code snippet will display the first image of the first batch of our dataset. Every image has a shape of 28 X 28.
%matplotlib inline
from matplotlib import pyplot as plt
for x,_ in data_loader:
plt.imshow(x.numpy()[0][0], cmap='gray')
break
Generator Model
Since the data in our training set has images of dimension 28 X 28 i.e 784 values, the objective of our Generator model is to output a vector of 784. We will then convert this vector to a 2d matrix of 28 X 28.
Let’s follow the steps below to create our Generator model.
Step 1: Create a class that inherits from torch.nn.Module class
class GeneratorModel(nn.Module):
Step 2: Define two methods in this class — __init__() and forward()
__init__() method is used to declare all the components that will be used by the model. We will use three hidden layers for our model. You can play with this number by adding more layers or eliminating a few layers from this network. A hidden layer consists of a linear layer followed by an activation function.
A Linear layer is defined using two values — input dimension and output dimension. In order to exemplify this, let’s consider the input of dimension d with m such inputs in a batch. So the size of our effective input is m X d. Now, this is passed through a linear layer of dimension (d,k) (which is a matrix of dimension d X k). The output would be of dimension (m X k).
We will use Leaky ReLU as our activation function which is a variant of ReLU activation function. Let’s review ReLU and Leaky ReLU for completeness.
The left graph is for ReLU activation function and the right one is for Leaky ReLU. In ReLU negative values are suppressed to 0 while in Leaky ReLU negative values are multiplied by a small constant ‘a’ to reduce the magnitude of the value. We will use tanh activation function for the last layer (the output layer). These choices are standard choices in a machine learning setup.
This is how a hidden layer looks like —
nn.Sequential(nn.Linear(input_dim, 256),nn.LeakyReLU(0.2))
As previously mentioned, we will define three such hidden layers and an output layer with tanh activation function.
Let’s define the second method of our model class: the forward() method. This method takes the input (random noise in our case) and passes this input through the defined model sequentially and returns the output.
We will use a random noise of dimension 100 in this example. Below is the full code for our Generator model class:
Now, let’s define our Discriminator Model —
In terms of architecture, the discriminator model is very similar to the Generator network except for the output layer and the use of dropout. The Generator network is expected to generate an image (hence the output dim is 784), the discriminator network needs to discriminate between the fake generated image and the actual image. So, the output dimension is 1 which is the probability of the input being real.
We use sigmoid activation function instead of tanh here in the last layer. Explaining the concept of dropout is out of scope of this tutorial.
Now, we can initialize these models and move them to our device. Note that it is required to move these variables to the GPU (if available) so that all the computations can be performed on GPU.
discriminator = DiscriminatorModel()generator = GeneratorModel()discriminator.to(device)generator.to(device)
This concludes the modeling part. Following the steps mentioned before, it’s time to define the loss function and the optimizer function. Since we have two classes (real and fake), we will use binary cross-entropy loss. Furthermore, we will use Adam optimizer for both of our models i.e Generator and Discriminator.
loss = nn.BCELoss()discriminator_optimizer = optim.Adam(discriminator.parameters(), lr=0.0002)generator_optimizer = optim.Adam(generator.parameters(), lr=0.0002)
Now the climax — The Training Loop
In a single training step, we need to update parameters of the Generator model as well as the Discriminator model.
Here is the outline of a single training step:
- Update Discriminator Model
— Clear the optimizer gradients by calling optimizer.zero_grad()
— Forward pass to the discriminator model with the true data as input and obtain the output.
— Compute loss using the discriminator’s output and the true_labels ( ture_label is 1 for real data).
— Forward pass to the discriminator model but this time with the generated data as input and obtain the output.
— Compute loss using the discriminator’s output and the fake_labels ( fake_label is 0 for generated data).
— Average both the losses
— Call optimizer.step() to backpropagate and update the weights of the discriminator model - Update Generator Model
— Clear the optimizer gradients by calling optimizer.zero_grad()
— Forward pass to the discriminator model with the generated data as input and obtain the output.
— Recall that the objective of the generator model is to fool the discriminator into labeling the generated data as real data. So, we compute the loss using the discriminator model’s output and the true_labels ( this time true_label is 1 for the fake data)
— Call optimizer.step() to backpropagate and update the weights of the generator model.
Let’s see it in action. The comments will further explain the code.
From GANs to Conditional GANs
The simple GAN we implemented above suffers from a serious problem. It is generating images unconditionally i.e we have no control over the output our model is generating. To overcome this limitation, conditional GANs were invented. The architecture of C-GANs is same as normal GANs but this time the model takes in some metadata as input along with the random noise and conditions the output on that.
We will pass the digit value as metadata and constrain the above GAN model to generate an image of the input digit value.
A few modifications need to be done to achieve the above objective:
- The generator model will take random noise of dimension 100 and the digit value as input. We will use an embedding layer of size (10,10) which will have a 10-dimensional encoding for all the 10 digits.
- We will concatenate the 10-dimensional embedding and the noise to get a 110-dimensional (instead of 100 as in normal Generator model) input that will be fed to the first hidden layer. The rest of the network will perform the same way.
- Both the above changes will be required for the Discriminator model also.
- In the training loop:
— Pass the labels along with random noise to the Generator
— Pass the labels along with the data to the Discriminator.
Let’s see how these modifications can be incorporated in the code
Generator and Discriminator Models:
Note: Only the changes have been highlighted.
class GeneratorModel(nn.Module):
def __init__(self):
super(GeneratorModel, self).__init__()
input_dim = 100 + 10
output_dim = 784
self.label_embedding = nn.Embedding(10, 10)
self.hidden_layer1 = nn.Sequential(
nn.Linear(input_dim, 256),
nn.LeakyReLU(0.2)
)
self.hidden_layer2 = nn.Sequential(
nn.Linear(256, 512),
nn.LeakyReLU(0.2)
)
self.hidden_layer3 = nn.Sequential(
nn.Linear(512, 1024),
nn.LeakyReLU(0.2)
)
self.hidden_layer4 = nn.Sequential(
nn.Linear(1024, output_dim),
nn.Tanh()
)
def forward(self, x, labels):
c = self.label_embedding(labels)
x = torch.cat([x,c], 1)
output = self.hidden_layer1(x)
output = self.hidden_layer2(output)
output = self.hidden_layer3(output)
output = self.hidden_layer4(output)
return output.to(device)
class DiscriminatorModel(nn.Module):
def __init__(self):
super(DiscriminatorModel, self).__init__()
input_dim = 784 + 10
output_dim = 1
self.label_embedding = nn.Embedding(10, 10)
self.hidden_layer1 = nn.Sequential(
nn.Linear(input_dim, 1024),
nn.LeakyReLU(0.2),
nn.Dropout(0.3)
)
self.hidden_layer2 = nn.Sequential(
nn.Linear(1024, 512),
nn.LeakyReLU(0.2),
nn.Dropout(0.3)
)
self.hidden_layer3 = nn.Sequential(
nn.Linear(512, 256),
nn.LeakyReLU(0.2),
nn.Dropout(0.3)
)
self.hidden_layer4 = nn.Sequential(
nn.Linear(256, output_dim),
nn.Sigmoid()
)
def forward(self, x, labels):
c = self.label_embedding(labels)
x = torch.cat([x, c], 1)
output = self.hidden_layer1(x)
output = self.hidden_layer2(output)
output = self.hidden_layer3(output)
output = self.hidden_layer4(output)
return output.to(device)
discriminator = DiscriminatorModel()
generator = GeneratorModel()
discriminator.to(device)
generator.to(device)
Training Loop —
n_epochs = training_parameters["n_epochs"]
batch_size = training_parameters["batch_size"]
for epoch_idx in range(n_epochs):
G_loss = []
D_loss = []
for batch_idx, data_input in enumerate(data_loader):
noise = torch.randn(batch_size,100).to(device)
fake_labels = torch.randint(0, 10, (batch_size,)).to(device)
generated_data = generator(noise, fake_labels) # batch_size X 784
# Discriminator
true_data = data_input[0].view(batch_size, 784).to(device) # batch_size X 784
digit_labels = data_input[1].to(device) # batch_size
true_labels = torch.ones(batch_size).to(device)
discriminator_optimizer.zero_grad()
discriminator_output_for_true_data = discriminator(true_data, digit_labels).view(batch_size)
true_discriminator_loss = loss(discriminator_output_for_true_data, true_labels)
discriminator_output_for_generated_data = discriminator(generated_data.detach(), fake_labels).view(batch_size) generator_discriminator_loss = loss(
discriminator_output_for_generated_data, torch.zeros(batch_size).to(device)
)
discriminator_loss = (
true_discriminator_loss + generator_discriminator_loss
) / 2
discriminator_loss.backward()
discriminator_optimizer.step()
D_loss.append(discriminator_loss.data.item())
# Generator
generator_optimizer.zero_grad()
# It's a choice to generate the data again
generated_data = generator(noise, fake_labels) # batch_size X 784
discriminator_output_on_generated_data = discriminator(generated_data, fake_labels).view(batch_size) generator_loss = loss(discriminator_output_on_generated_data, true_labels)
generator_loss.backward()
generator_optimizer.step()
G_loss.append(generator_loss.data.item())
if ((batch_idx + 1)% 500 == 0 and (epoch_idx + 1)%10 == 0):
print("Training Steps Completed: ", batch_idx)
with torch.no_grad():
noise = torch.randn(batch_size,100).to(device)
fake_labels = torch.randint(0, 10, (batch_size,)).to(device)
generated_data = generator(noise, fake_labels).cpu().view(batch_size, 28, 28)
for x in generated_data:
print(fake_labels[0].item())
plt.imshow(x.detach().numpy(), interpolation='nearest',cmap='gray')
plt.show()
break
print('[%d/%d]: loss_d: %.3f, loss_g: %.3f' % (
(epoch_idx), n_epochs, torch.mean(torch.FloatTensor(D_loss)), torch.mean(torch.FloatTensor(G_loss))))
Generated image when the input is 5
Follow up exercises
- After understanding the material covered in this article, one should try GAN, C-GAN architecture on the Fashion MNIST dataset.
- Try generating even numbers (in binary). Refer to the link for more details.