Implementing Deep Convolutional Generative Adversarial Network (DCGAN) using the Celeba dataset

Published in

AIGuys

15 min readNov 9, 2021

In this article, we will learn how to implement DCGAN on Celeba dataset using the PyTorch framework, but first, we will have to know some theoretical concepts about DCGAN then we will jump to the implementation.

So this article is organized into the following sections :

What’s GANS?
What’s the Intuition behind DCGAN?
DCGAN Architecture.
DCGAN’s Implementation.

What’s GANS?

Generative Adversarial Networks(GANs for short) compose of two different Deep Neural Networks (generator and discriminator), where :

The generator: learns how to create fake data from random input and tries to make this produced data very similar to the real data (training data), so its objective is to produce undistinguished data from the real one.
The discriminator: learns how to distinguish between fake & real data, so its objective is to identify fake data from the real one and this neural network works like a classifier that tries to classify real & fake data.

In GANs architecture we have the discriminator that takes samples of true and generated data and that try to classify them as well as possible and a generator that is trained to fool the discriminator as much as possible, so we can see the generator like an art forger who tries to forge paint and the discriminator as a detective of art.

And because the generator uses a random input, it’s clear that it will not give better results in the beginning and that’s why we need to calculate the loss function of both the G & D( using binary cross-entropy) and do backpropagation in both of them in order to train the generator to generate a data that’s indistinguishable from the real one, here’s an image that explains how GANs work.

For more details about GANs, you can check my first article: GANs (generative adversarial networks)

What’s the Intuition behind DCGAN? 💭

The Implementation of simple GANs models shows that using fully connected layers reduces the quality of the generated images and this is what makes the researchers Radford et. al introduce DCGAN that generates high-quality images in the paper Unsupervised Representation Learning With Deep Convolutional Generative Adversarial Networks.

So DCGANs are better than simple GANs because of it :

Use Strided Convolutional layers in the discriminator to downsample the images.
Use fractionally –strided convolutional layers to upsample the images.

Before getting through the details we need to learn how Strided Convolutional layers & fractionally –strided convolutional layers work.

1. Strided Convolutional layers :

Strided convolutional is a DeepLearning technique that lets you reduce the size of your data, so to understand how it works we will take the following example:

we have an image of 2 dimensions (Matrix) so the 2D-convolution operation is applicable we do it by multiplying the matrix image with a filter (3*3 size) and we will get a final image of size 3*3 (shown on the right).

Note how the filter or kernel now strides with a step size of one, sliding pixel by pixel, over every column for each row.

https://learnopencv.com/deep-convolutional-gan-in-pytorch-and-tensorflow/

In DCGAN, the authors used a Stride of 2, which means that the filter slides through the image, moving 2 pixels per step, and also eliminating the max polling technique to downsampling images and replace it by a strided convolution, with a stride of 2, to downsample the image in Discriminator.

2. Fractionally –strided convolutional layers :

Fractionally –strided convolutional layers is a DeepLearning technique that lets you increase the size of the data, as shown in the below figure, a 2 x 2 input matrix is upsampled to a 5 x 5 matrix.

In DCGAN, the authors used a series of four fractionally-strided convolutions to upsample the 100-dimensional input(noise vector), into a 64 × 64-pixel image in the Generator.

DCGAN Architecture :

To sum up, in DCGAN we will use the following techniques :

Replacing pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator).
Use Batchnorm in both the generator and discriminator.
Remove fully connected hidden layers.
Use ReLu activation in a generator for all layers except the last one we use tanh.
Use LeakyReLu activation in the discriminator for all the layers.

DCGAN’s Implementation :

For the implementation, if you want to do it on your own computer you will need first to install :

Python 3.8: https://www.python.org/downloads/
Anaconda : https://docs.anaconda.com/anaconda/install/index.html
Install PyTorch library in anaconda :

conda install pytorch torchvision cudatoolkit=10.2 -c pytorch

And to check that your installation is right use that cmd :

conda list torch

Or you can skip all this installation and use collab: https://colab.research.google.com/

For training data we will use a famous dataset of Celerate people that you can find it on Kaggle: https://www.kaggle.com/jessicali9530/celeba-dataset

Now we can start coding 😇

First, we need to import the following libraries :

from __future__ import print_function
#%matplotlib inline
import argparse
import os
import random
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.optim as optim
import torch.utils.data
import torchvision.datasets as dset
import torchvision.transforms as transforms
import torchvision.utils as vutils
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from IPython.display import HTML# Set random seed for reproducibility
#manualSeed = 999 #or you can fixe it in a specific value
Seed = random.randint(1, 10000) # use if you want new results
print("Random Seed: ", Seed)
random.seed(Seed)
torch.manual_seed(Seed)

we will define inputs for our model or what we call hyperparameters:

dataroot: it’s for the path to the root directory where you put your dataset.

workers: it’s the number of threads used to load data with the dataLoader

batch_size: it’s the batch size for the training & in the DCGAN paper they use 128 and this is what we will use also.

image_size: it defines the size of the images used in the training and in this implementation we will use 64*64, so if you want to use another value you will have to change the structures of the Generator and the Discriminator.

nc: it’s the number of color channels used in input images and because we’re using colored images of people's faces, it will have a value 3 (RGB mode ).

nz: length of the latent vector (Noise vector).

ngf: it defines the depth of feature maps carried through the generator.

ndf: it defines the depth of feature maps propagated through the discriminator.

num_epochs: it defines the number of training epochs to run,

lr: learning rate, as it mentioned in the DCGAN paper, lr should be 0.0002

beta1: it’s a hyperparameter for Adam optimizers, as mentioned also in the DCGAN paper, it should have a 0.5 value.

ngpu: number of the GPU available in your computer, if it gets 0 the code will be executed in CPU, if it’s greater than 0, it will run on that number of GPUs

# Root directory of where you set your dataset
dataroot = "celeba_gan"# Number of workers for dataloader
workers = 2# Batch size during training
batch_size = 128# Spatial size of training images. we will resize All images to this
#   size using a transformer.
image_size = 64# Number of channels in the training images. For color images this is 3 rbg
nc = 3# Size of z latent vector (it's the size of the noize vector )
nz = 100# Size of feature maps in generator
ngf = 64# Size of feature maps in discriminator
ndf = 64# Number of training epochs 
num_epochs = 5# Learning rate for optimizers
lr = 0.0002# Beta1 hyperparam for Adam optimizers
beta1 = 0.5# Number of GPUs available. Use 0 for CPU mode. & 1 for gpu #mode(like NVIDIA)
ngpu = 1

Here we create a function to load data :

def load_data(dataroot,image_size,batch_size,workers,ngpu):
 
    dataset = dset.ImageFolder(root=dataroot,
                               transform=transforms.Compose([
                                   transforms.Resize(image_size),
                                   transforms.CenterCrop(image_size),
                                   transforms.ToTensor(),
                                   transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
                               ]))
    # Create the dataloader
    dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size,
                                             shuffle=True, num_workers=workers)
    return dataloader

After that, we will choose in which devise we will run it.

dataloader=load_data(dataroot,image_size,batch_size,workers,ngpu)
 # Decide which device we want to run on
device = torch.device("cuda:0" if (torch.cuda.is_available() and ngpu > 0) else "cpu")

And then we will read some training images from the celeba dataset.

real_batch = next(iter(dataloader))
plt.figure(figsize=(8,8))
plt.axis("off")
plt.title("Training Images")
plt.imshow(np.transpose(vutils.make_grid(real_batch[0].to(device)[:64], padding=2, normalize=True).cpu(),(1,2,0)))

Weight Initialization:

from the DCGAN paper, the authors prove that all model weights shall be randomly initialized from a Normal distribution with mean=0, stdev=0.02.

so the weights_init function takes an initialized model as an input and reinitializes all convolutional, convolutional-transpose layers, and batch normalization layers.

we apply this function immediately after initialization.

def weights_init(m):
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:
        nn.init.normal_(m.weight.data, 0.0, 0.02)
    elif classname.find('BatchNorm') != -1:
        nn.init.normal_(m.weight.data, 1.0, 0.02)
        nn.init.constant_(m.bias.data, 0)

Create the generator network :

As we now know that the generator takes a random vector (z) as input, but because our data are images, we need to convert that vector (z) to data space which means ultimately creating an RGB image with the same size as the training images (i.e. 3x64x64).

So to sum up, our data will be reshaped as follows :

100x1 → 1024x4x4 → 512x8x8 → 256x16x16 → 128x32x32 → 64x64x3

Here’s an image that explains that from the DCGAN paper.

Generator Architecture [from DCGAN paper]

class Generator(nn.Module):
    def __init__(self, ngpu):
        super(Generator, self).__init__()
        self.ngpu = ngpu
        self.main = nn.Sequential(
            # input is Z, going into a convolution
            nn.ConvTranspose2d( nz, ngf * 8, 4, 1, 0, bias=False),
            nn.BatchNorm2d(ngf * 8),
            nn.ReLU(True),
            # state size. (ngf*8) x 4 x 4
            nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 4),
            nn.ReLU(True),
            # state size. (ngf*4) x 8 x 8
            nn.ConvTranspose2d( ngf * 4, ngf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 2),
            nn.ReLU(True),
            # state size. (ngf*2) x 16 x 16
            nn.ConvTranspose2d( ngf * 2, ngf, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf),
            nn.ReLU(True),
            # state size. (ngf) x 32 x 32
            nn.ConvTranspose2d( ngf, nc, 4, 2, 1, bias=False),
            nn.Tanh()
            # state size. (nc) x 64 x 64
        )def forward(self, input):
        return self.main(input)

Now, we can instantiate the generator and we’ll save it in the netG variable, then we can choose the type of device to run the code (choose multi-gpu for example) and finally, we apply weight_init function on our netG to randomly initialize all weights to mean=0 & stdev=0.02 as we explain it before and print the model’s structure.

# Create the generator
netG = Generator(ngpu).to(device)# Handle multi-gpu if desired
if (device.type == 'cuda') and (ngpu > 1):
    netG = nn.DataParallel(netG, list(range(ngpu)))# Apply the weights_init 
netG.apply(weights_init)# Print the model
print(netG)

and we will get this result :

Create the discriminator network :

For the discriminator D, we know that it’s a binary classification network that takes an image as input and outputs a scalar probability, so it takes a 3*64*64 input image and processes it through a series of conv2d, BatchNorm2d, and LeakyReLU layers, and the final output through a sigmoid function to get the final probability.

The DCGAN paper mentions that using strided convolution instead of polling to downsample is a good practice because it let the model learns its own polling function, also using a batch norm & leaky relu helps the model to get a better gradient flow which is critical for the learning process of both G & D.

class Discriminator(nn.Module):
    def __init__(self, ngpu):
        super(Discriminator, self).__init__()
        self.ngpu = ngpu
        self.main = nn.Sequential(
            # input is (nc) x 64 x 64
            nn.Conv2d(nc, ndf, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf) x 32 x 32
            nn.Conv2d(ndf, ndf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 2),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*2) x 16 x 16
            nn.Conv2d(ndf * 2, ndf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 4),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*4) x 8 x 8
            nn.Conv2d(ndf * 4, ndf * 8, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 8),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*8) x 4 x 4
            nn.Conv2d(ndf * 8, 1, 4, 1, 0, bias=False),
            nn.Sigmoid()
        )def forward(self, input):
        return self.main(input)

And we will do the same thing with the discriminator (create the netD, choose the type of device to run the code, and apply weights_init with the same values on netD) and print the model’s structure.

# Create the Discriminator
netD = Discriminator(ngpu).to(device)
if (device.type == 'cuda') and (ngpu > 1):
    netD = nn.DataParallel(netD, list(range(ngpu)))netD.apply(weights_init)# Print the model
print(netD)

and we will get this result :

After creating the generator & discriminator networks we will initialize the Binary Cross-Entropy Loss function.

# Initialize BCELoss function
criterion = nn.BCELoss()# Create batch of latent vectors that we will use to visualize
#  the progression of the generator
fixed_noise = torch.randn(64, nz, 1, 1, device=device)# Establish convention for real and fake labels during training
real_label = 1.
fake_label = 0.# Setup Adam optimizers for both G and D
optimizerD = optim.Adam(netD.parameters(), lr=lr, betas=(beta1, 0.999))
optimizerG = optim.Adam(netG.parameters(), lr=lr, betas=(beta1, 0.999))

Now let’s train our model 💪

After creating & initializing all the DCGAN model parts, we can start the training and to do that we will follow Algorithm 1 from Goodfellow’s paper with some of the best practices shown in ganhacks, by constructing different mini-batches for real and fake images, and also adjust G’s objective function to maximize log(D(G(Z))).

So the training is split into 2 parts:

Part 1 to update the Discriminator.
Part 2 to update the Generator.

Part 1 — Train the Discriminator :

The goal of training the discriminator is to maximize the probability of classifying real & fake data, in practice, we need to maximize the value of ‘log(D(x))+log(1−D(G(z)))’ and because we have separate the mini-batch of real inputs from the fake ones(suggested by ganhacks), we will calculate this value in 2 steps :

First, we will construct a batch of real samples from the training dataset, forward pass through D, calculate the loss (log(D(x)), then calculate the gradients in a backward pass.
Secondly, we will construct a batch of fake samples with the current generator (current data generated by net G), forward pass this batch through D, calculate the loss (log(1-D(G(z))), and accumulate the gradients with a backward pass.

Now, with the gradients accumulated from both the all-real and all-fake batches, we will compute the error of D as a sum over the fake and the real batches and finally, we update D by using the Discriminator’s optimizer.

Part 2 — Train the Generator :

To train the generator we need to minimize the log(1-D(G(Z))), in order to make the production of fake data much better, but in practice, we use max(log(D(G(z)))) because the first formulation has vanishing gradients early on and this was proved by Goodfellow et al in GAN paper, then we will classify the generator output from part 1 with the discriminator, Since that we just updated D and we will compute both G’s loss using real labels & G’s gradients in a backward pass, and finally update G’s parameters with an optimizer step.

It may seem contradictory to use real labels as the generator’s training labels for the loss function but in reality, this allows us to use the log(x) part of the BCELoss (instead of the log(1−x) part) which is exactly what we want.

Finally, we will do some training statistics to evaluate our model and track the process of the generator, so we will save the following values to display in plot after finishing the training :

img_list: to save the generated images.
G_losses: generator’s losses to save the value of log(D(G(z)))in each iteration.
D_losses : discriminator’s losses to save the loss of all real & fake data (log(D(x)) + log(1 — D(G(z)))log(D(x))+log(1−D(G(z)))) in each iteration.
D(x): the average output (across the batch) of the discriminator for the all real batch, it should start close to 1 and theoretically converge to 0.5 when G gets better.
D(G(z)): average discriminator outputs for the all fake batch,it should start near 0 and converge to 0.5 as G gets better.

# Training Loop# Lists to keep track of progress
img_list = []
G_losses = []
D_losses = []
iters = 0print("Starting Training Loop...")
# For each epoch
for epoch in range(num_epochs):
    epoch+=1
    # For each batch in the dataloader
    for i, data in enumerate(dataloader, 0):############################
        # (1) Update D network: maximize log(D(x)) + log(1 - D(G(z)))
        ###########################
        ## Train with all-real batch
        netD.zero_grad()
        # Format batch
        real_cpu = data[0].to(device)
        b_size = real_cpu.size(0)
        label = torch.full((b_size,), real_label, dtype=torch.float, device=device)
        # Forward pass real batch through D
        output = netD(real_cpu).view(-1)
        # Calculate loss on all-real batch
        errD_real = criterion(output, label)
        # Calculate gradients for D in backward pass
        errD_real.backward()
        D_x = output.mean().item()## Train with all-fake batch
        # Generate batch of latent vectors
        noise = torch.randn(b_size, nz, 1, 1, device=device)
        # Generate fake image batch with G
        fake = netG(noise)
        label.fill_(fake_label)
        # Classify all fake batch with D
        output = netD(fake.detach()).view(-1)
        # Calculate D's loss on the all-fake batch
        errD_fake = criterion(output, label)
        # Calculate the gradients for this batch, accumulated (summed) with previous gradients
        errD_fake.backward()
        D_G_z1 = output.mean().item()
        # Compute error of D as sum over the fake and the real batches
        errD = errD_real + errD_fake
        # Update D
        optimizerD.step()############################
        # (2) Update G network: maximize log(D(G(z)))
        ###########################
        netG.zero_grad()
        label.fill_(real_label)  # fake labels are real for generator cost
        # Since we just updated D, perform another forward pass of all-fake batch through D
        output = netD(fake).view(-1)
        # Calculate G's loss based on this output
        errG = criterion(output, label)
        # Calculate gradients for G
        errG.backward()
        D_G_z2 = output.mean().item()
        # Update G
        optimizerG.step()# Output training stats
        if i % 50 == 0:
            print('[%d/%d][%d/%d]\tLoss_D: %.4f\tLoss_G: %.4f\tD(x): %.4f\tD(G(z)): %.4f / %.4f'
                  % (epoch, num_epochs, i, len(dataloader),
                     errD.item(), errG.item(), D_x, D_G_z1, D_G_z2))# Save Losses for plotting later
        G_losses.append(errG.item())
        D_losses.append(errD.item())
       
     
        # Check how the generator is doing by saving G's output on fixed_noise
        if (iters % 500 == 0) or ((epoch == num_epochs-1) and (i == len(dataloader)-1)):
            with torch.no_grad():
                fake = netG(fixed_noise).detach().cpu()
            img_list.append(vutils.make_grid(fake, padding=2, normalize=True))iters += 1
    if epoch % 5 == 0:
            torch.save(netG, 'Generator_epoch_{}.pth'.format(epoch))
            print(epoch)
            #print('Model saved.')

Now we have look on how the generator’s & discriminator’s loss perform during the training process ,and to do that we will visualize them in plot .

plt.figure(figsize=(10,5))
plt.title("Generator and Discriminator Loss During Training")
plt.plot(G_losses,label="G")
plt.plot(D_losses,label="D")
plt.xlabel("iterations")
plt.ylabel("Loss")
plt.legend()
plt.show()

And we will have this result

Analyzing the graph, we find that in the first iterations (between 0–1000 ) the Generator’s loss gives bad results with oscillations, then in the next 3000 iterations the Generator’s loss gets better, and in the last iteration( ~ 8000 iters) will converge to minimum value with oscillations, for the discriminator’s loss we notice that is stable in a value near to zero but we oscillations also and this is because the generator gets much better in the last iterations.

we can visualize the training progression of G with an animation. Press the play button to start the animation.

#%%capture
fig = plt.figure(figsize=(8,8))
plt.axis("off")
ims = [[plt.imshow(np.transpose(i,(1,2,0)), animated=True)] for i in img_list]
ani = animation.ArtistAnimation(fig, ims, interval=1000, repeat_delay=1000, blit=True)HTML(ani.to_jshtml())

Real Images vs. Fake Images

Finally, let’s visualize real images and fake images side by side.

# Grab a batch of real images from the dataloader
real_batch = next(iter(dataloader))# Plot the real images
plt.figure(figsize=(15,15))
plt.subplot(1,2,1)
plt.axis("off")
plt.title("Real Images")
plt.imshow(np.transpose(vutils.make_grid(real_batch[0].to(device)[:64], padding=5, normalize=True).cpu(),(1,2,0)))# Plot the fake images from the last epoch
plt.subplot(1,2,2)
plt.axis("off")
plt.title("Fake Images")
plt.imshow(np.transpose(img_list[-1],(1,2,0)))
plt.show()

And because we all know that ML models take a long time to finish the simulation it would be much better to save a copy of your result, using the following line of code

torch.save(img_list,'img_generated_15_1583.pth')

and then you can use it later, all you need is to reload it again like that :

img_l=torch.load('img_generated_15_1583.pth')

and then visualize it in a plot as we did in the first time.

Alright we have talked a lot about DCGAN and how to implement it, here is the link to the full implementation: Code_Repo

Conclusion

To conclude, the results that I obtained are made on a dataset of 1583 samples with 5 epochs(7915 iterations in total), and you can improve these results by increasing the number of epochs or the size of the dataset or even changing the structure of the generator and/or discriminator especially when the problem will be more complex, and then test them and see if it gives you better results.

References

https://channel9.msdn.com/Events/Neural-Information-Processing-Systems-Conference/Neural-Information-Processing-Systems-Conference-NIPS-2016/Generative-Adversarial-Networks