DCGAN — Implementing Deep Convolutional Generative Adversarial Network in TensorFlow — Idiot Developer

Nikhil Tomar
Analytics Vidhya
Published in
6 min readJul 24, 2020
Source: Photo by Mike Petrucci on Unsplash

In this tutorial, we are going to implement a Deep Convolutional Generative Adversarial Network (DCGAN) on Anime faces dataset. The code is written in TensorFlow 2.2 and Python 3.8.

According to Yann LeCun, the director of Facebook AI, GAN is the “most interesting idea in the last 10 years of machine learning.”

Overview:

  1. What are GANs
  2. How is DCGAN different from Vanilla GAN
  3. Anime Faces Dataset
  4. Implementation
  5. Summary

What is GAN

Generative Adversarial Network or GAN is a machine learning approach used for generative modelling designed by Ian Goodfellow and his colleagues in 2014. A GAN consists of two models: a generator and a discriminator, both of which are trained simultaneously by an adversarial process. The generator learns to generate images that look like real images, while the discriminator learns to differentiate between the real and the fake images.

We train both these models until the discriminator cannot differentiate between the real and the fake images.

Learn more: GAN — What is Generative Adversarial Network?

How is DCGAN different from Vanilla GAN?

The vanilla GAN developed by Ian Goodfellow is based on the dense or fully-connected layer. The DCGAN or Deep Convolutional Generative Adversarial Network is an extension of the Generative Adversarial Network (GAN) and is built using the convolutional and transpose convolution layers. The DCGAN was first described in the paper Unsupervised Representation Learning With Deep Convolutional Generative Adversarial Networks by Radford et. al.

Anime Faces Dataset

The dataset consists of 21551 anime faces scraped from www.getchu.com, which are then cropped using the anime face detection algorithm. All images are resized to 64 * 64 for the sake of convenience.

Some images from the anime face dataset

Download: Anime Faces

Implementation

The following code demonstrates the implementation of Deep Convolutional Generative Adversarial Network (DCGAN) in TensorFlow on the Anime Faces dataset.

Import TensorFlow and Other Libraries

import os
import numpy as np
import cv2
from glob import glob
from matplotlib import pyplot
from sklearn.utils import shuffle
import tensorflow as tf
from tensorflow.keras.layers import *
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam

Now, we define the dimensions of the anime images.

IMG_H = 64 
IMG_W = 64
IMG_C = 3

The weight initialization for all the weights or kernels in the DCGAN must be randomly initialized from a normal distribution with mean=0.0 and standard deviation = 0.02. This weights initializer is used in both the convolutional and the transpose convolutional layer.

w_init = tf.keras.initializers.RandomNormal(mean=0.0, stddev=0.02)

Loading and Preparing Dataset

The load_image function takes an image path and returns a tensor with values between -1 and 1. It does the following tasks:

  • First, we read the image path.
  • Next, we read the JPEG image file and return the uint8 tensor.
  • Next, the image is resized with any extra cropping or padding required.
  • The data type of the image is changed to float 32.
  • Now, we normalize the image pixel value between the range of -1 to 1.
def load_image(image_path):
img = tf.io.read_file(image_path)
img = tf.io.decode_jpeg(img)
img = tf.image.resize_with_crop_or_pad(img, IMG_H, IMG_W)
img = tf.cast(img, tf.float32)
img = (img - 127.5) / 127.5
return img

The tf_dataset function is used to set the TensorFlow dataset pipeline for the training.

def tf_dataset(images_path, batch_size):
dataset = tf.data.Dataset.from_tensor_slices(images_path)
dataset = dataset.shuffle(buffer_size=10240)
dataset = dataset.map(load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE)
dataset = dataset.batch(batch_size)
dataset = dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)

return dataset

Transpose Convolution Blocks

The transpose convolution is used to build the generator model. It is used to increase the dimensions (height and width) of the incoming feature maps.

def deconv_block(inputs, num_filters, kernel_size, strides, bn=True):
x = Conv2DTranspose(
filters=num_filters,
kernel_size=kernel_size,
kernel_initializer=w_init,
padding="same",
strides=strides,
use_bias=False
)(inputs)

if bn:
x = BatchNormalization()(x)
x = LeakyReLU(alpha=0.2)(x)
return x

The strided-convolution is used to build the discriminator model.

def conv_block(inputs, num_filters, kernel_size, padding="same", strides=2, activation=True):
x = Conv2D(
filters=num_filters,
kernel_size=kernel_size,
kernel_initializer=w_init,
padding=padding,
strides=strides,
)(inputs)
if activation:
x = LeakyReLU(alpha=0.2)(x)
x = Dropout(0.3)(x)
return x

Generator

The generator takes the random noise in the latent vector and maps it to the data space. As we are using RGB images, so our data-space means creating an RGB image.

The generator starts with a dense or fully-connected layer. After that, it is followed the series of transpose convolution, batch normalization and the leaky relu activation function.

At the last, we use a convolution layer with three filters and tanh activation function to generate the RGB image.

def build_generator(latent_dim):
f = [2**i for i in range(5)][::-1]
filters = 32
output_strides = 16
h_output = IMG_H // output_strides
w_output = IMG_W // output_strides
noise = Input(shape=(latent_dim,), name="generator_noise_input") x = Dense(f[0] * filters * h_output * w_output, use_bias=False)(noise)
x = BatchNormalization()(x)
x = LeakyReLU(alpha=0.2)(x)
x = Reshape((h_output, w_output, 16 * filters))(x)
for i in range(1, 5):
x = deconv_block(x,
num_filters=f[i] * filters,
kernel_size=5,
strides=2,
bn=True
)
x = conv_block(x,
num_filters=3,
kernel_size=5,
strides=1,
activation=False
)
fake_output = Activation("tanh")(x)
return Model(noise, fake_output, name="generator")

Discriminator

The discriminator is a simple binary classification network that takes both the real and the fake image and outputs a probability of whether the given image is real or fake.

For this purpose, a series strided-convolution is used with leaky relu and the dropout with 0.3. At the last, we flatten the feature maps and use a fully-connected layer with 1 unit. Next, we apply a sigmoid activation function to the fully connected layer.

def build_discriminator():
f = [2**i for i in range(4)]
image_input = Input(shape=(IMG_H, IMG_W, IMG_C))
x = image_input
filters = 64
output_strides = 16
h_output = IMG_H // output_strides
w_output = IMG_W // output_strides
for i in range(0, 4):
x = conv_block(x, num_filters=f[i] * filters, kernel_size=5, strides=2)
x = Flatten()(x)
x = Dense(1)(x)
return Model(image_input, x, name="discriminator")

Complete DCGAN Model

The GAN class denotes the complete DCGAN model with the training step defined in it. It takes the discriminator model, generator mode and the loss function. The loss function used here is binary cross-entropy.

The train_step function is used for training the DCGAN model. The training starts with the discriminator. The discriminator is first trained on the fake images generated by the generator. After that it is trained on the real images from the anime faces dataset. Next, the generator is trained based on how well the discriminator is trained.

class GAN(Model):
def __init__(self, discriminator, generator, latent_dim):
super(GAN, self).__init__()
self.discriminator = discriminator
self.generator = generator
self.latent_dim = latent_dim
def compile(self, d_optimizer, g_optimizer, loss_fn):
super(GAN, self).compile()
self.d_optimizer = d_optimizer
self.g_optimizer = g_optimizer
self.loss_fn = loss_fn
def train_step(self, real_images):
batch_size = tf.shape(real_images)[0]

for _ in range(2):
## Train the discriminator
random_latent_vectors = tf.random.normal(shape=(batch_size, self.latent_dim))
generated_images = self.generator(random_latent_vectors)
generated_labels = tf.zeros((batch_size, 1))
with tf.GradientTape() as ftape:
predictions = self.discriminator(generated_images)
d1_loss = self.loss_fn(generated_labels, predictions)
grads = ftape.gradient(d1_loss, self.discriminator.trainable_weights)
self.d_optimizer.apply_gradients(zip(grads, self.discriminator.trainable_weights))
## Train the discriminator
labels = tf.ones((batch_size, 1))
with tf.GradientTape() as rtape:
predictions = self.discriminator(real_images)
d2_loss = self.loss_fn(labels, predictions)
grads = rtape.gradient(d2_loss, self.discriminator.trainable_weights)
self.d_optimizer.apply_gradients(zip(grads, self.discriminator.trainable_weights))
## Train the generator
random_latent_vectors = tf.random.normal(shape=(batch_size, self.latent_dim))
misleading_labels = tf.ones((batch_size, 1))
with tf.GradientTape() as gtape:
predictions = self.discriminator(self.generator(random_latent_vectors))
g_loss = self.loss_fn(misleading_labels, predictions)
grads = gtape.gradient(g_loss, self.generator.trainable_weights)
self.g_optimizer.apply_gradients(zip(grads, self.generator.trainable_weights))
return {"d1_loss": d1_loss, "d2_loss": d2_loss, "g_loss": g_loss}

Saving image

def save_plot(examples, epoch, n):
examples = (examples + 1) / 2.0
for i in range(n * n):
pyplot.subplot(n, n, i+1)
pyplot.axis("off")
pyplot.imshow(examples[i])
filename = f"samples/generated_plot_epoch-{epoch+1}.png"
pyplot.savefig(filename)
pyplot.close()

Finally, running the code

if __name__ == "__main__":
## Hyperparameters
batch_size = 128
latent_dim = 128
num_epochs = 60
images_path = glob("data/*")
d_model = build_discriminator()
g_model = build_generator(latent_dim)
# d_model.load_weights("saved_model/d_model.h5")
# g_model.load_weights("saved_model/g_model.h5")
d_model.summary()
g_model.summary()
gan = GAN(d_model, g_model, latent_dim) bce_loss_fn = tf.keras.losses.BinaryCrossentropy(from_logits=True, label_smoothing=0.1)
d_optimizer = tf.keras.optimizers.Adam(learning_rate=0.0002, beta_1=0.5)
g_optimizer = tf.keras.optimizers.Adam(learning_rate=0.0002, beta_1=0.5)
gan.compile(d_optimizer, g_optimizer, bce_loss_fn)
images_dataset = tf_dataset(images_path, batch_size)

for epoch in range(num_epochs):
gan.fit(images_dataset, epochs=1)
g_model.save("saved_model/g_model.h5")
d_model.save("saved_model/d_model.h5")
n_samples = 25
noise = np.random.normal(size=(n_samples, latent_dim))
examples = g_model.predict(noise)
save_plot(examples, epoch, int(np.sqrt(n_samples)))

Now, we are going to see the changes in the images generated at the first and the last epoch.

Images generated at epoch 1
Images generated at epoch 60

So, you can see after training the DCGAN on 60 epochs, the generator starts generating images that look real.

Summary

In this tutorial, we learn about implementing the Deep Convolutional Generative Adversarial Network (DCGAN) on the anime faces dataset. I hope after this tutorial, you will start building your own DCGANs.

Still have some questions, comment below and I will do my best to answer. For more updates. Follow me.

Originally published at https://idiotdeveloper.com on July 24, 2020.

--

--