Celebrity Face Generation using GANs (Tensorflow Implementation)

Short Introduction of GANs

Shubham Sharma
Coinmonks
Published in
8 min readAug 4, 2018

--

Generative adversarial networks (GANs) are one of the hottest topics in deep learning. (GANs) are a class of artificial algorithms used in unsupervised learning algorithm, implemented by a system of two neural networks

  1. Generator
  2. Discriminator

Both networks are contesting with each other in the Zero-Sum Game Framework. Generative Adversarial Networks(GANs) are a set of models that basically learn to create synthetic data that is similar to input data it’s given.

The discriminator has the task of determining whether a given image looks natural (i.e, is an image from the dataset) or looks like it has been artificially created. The task of the generator is to create natural looking images that are similar to the original data distribution, images that look natural enough to fool the discriminator network. Firstly a random noise is given to the Generator, using this it creates the Fake images and then these Fake images are along with original images sent to the Discriminator.

The discriminative model has the task of determining whether a given image looks natural (an image from the dataset) or looks like it has been artificially created. This is basically a binary classifier that will take the form of a normal convolutional neural network (CNN). The task of the generator is to create natural looking images that are similar to the original data distribution.

The generator is trying to fool the discriminator while the discriminator is trying to not get fooled by the generator. As the models train through alternating optimization, both methods are improved until a point where the “Fake images are indistinguishable from the dataset images”.

Mathematical Equation of Generative Adversarial Network:

We can consider this equation composed of two parts, first part is the data sampled from the original data distribution and second part is the data sampled from the data distribution of noise.

First Part-

Discriminator always wants to maximize its probability of classifying an image correctly as real or fake. Here, the images are sampled from the original data distribution, which is the real data itself. we know that D(x) shows the probability that the image is real so Discriminator always wants to maximize D(x) , so log(D(x)) should be maximized and First part has to be maximized.

Second Part-

‘z’ is the random noise sample and G(z) is the generated image using a noise sample. The explanation for this term is quite similar. Generator always wants maximize the probability that the discriminator getting fooled by the generated images. Which means, the generator should want to maximize D(G(z)), so it should minimize 1- D(G(z)) and hence log(1- D(G(z))).

Celebrity Image Generation using GANs

Celebrity Image Dataset:

CelebA dataset is the collection of over 200,000 celebrity faces with annotations. Since in this blog, I am just going to generate the faces so I am not taking annotations into consideration.

1).Getting the Data:-

import helper
helper.download_extract('celeba', data_dir)

I have created the helper.py file through which you can download the CelebA dataset images. While running this code snippet, it will download the CelebA dataset .(Source code link is given below).

2).Preprocessing the images:-

Since I am working only on faces so I have resized it down to 28*28 in order to get the good results. I have cropped the portion of image which not includes the image portion.

#snippet of Helper python file which preprocess the given image dataset.def get_image(image_path, width, height, mode):
"""
Read image from image_path
:param image_path: Path of image
:param width: Width of image
:param height: Height of image
:param mode: Mode of image
:return: Image data
"""
image = Image.open(image_path)
if image.size != (width, height):

face_width = face_height = 108
j = (image.size[0] - face_width) // 2
i = (image.size[1] - face_height) // 2
image = image.crop([j, i, j + face_width, i + face_height])
image = image.resize([width, height], Image.BILINEAR)
return np.array(image.convert(mode))

Since Generative Adversarial Network are very hard to train.(you can checkout this link in order to know why training of Generative Adversarial Network is so hard?).

In order to get the accurate results we should have a good GPU(4GB or above than this), by running this code snippet you can find whether tensorflow is installed with GPU or not.

from distutils.version import LooseVersion
import warnings
import tensorflow as tf
# Check TensorFlow Version
assert LooseVersion(tf.__version__) >= LooseVersion('1.0'), 'Please use TensorFlow version 1.0 or newer. You are using {}'.format(tf.__version__)
print('TensorFlow Version: {}'.format(tf.__version__))
# Check for a GPU
if not tf.test.gpu_device_name():
warnings.warn('No GPU found. Please use a GPU to train your neural network.')
else:
print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))

3).Model Inputs and Network Architecture-

I am taking image width, image height ,image channel and noise parameter as model inputs which is further taken by the generator for generating the fake images.

Architecture of Generator :-

def generator(z, out_channel_dim, is_train=True, alpha=0.2, keep_prob=0.5):

with tf.variable_scope('generator', reuse=(not is_train)):
# First fully connected layer, 4x4x1024
fc = tf.layers.dense(z, 4*4*1024, use_bias=False)
fc = tf.reshape(fc, (-1, 4, 4, 1024))
bn0 = tf.layers.batch_normalization(fc, training=is_train)
lrelu0 = tf.maximum(alpha * bn0, bn0)
drop0 = tf.layers.dropout(lrelu0, keep_prob, training=is_train)

# Deconvolution, 7x7x512
conv1 = tf.layers.conv2d_transpose(drop0, 512, 4, 1, 'valid', use_bias=False)
bn1 = tf.layers.batch_normalization(conv1, training=is_train)
lrelu1 = tf.maximum(alpha * bn1, bn1)
drop1 = tf.layers.dropout(lrelu1, keep_prob, training=is_train)

# Deconvolution, 14x14x256
conv2 = tf.layers.conv2d_transpose(drop1, 256, 5, 2, 'same', use_bias=False)
bn2 = tf.layers.batch_normalization(conv2, training=is_train)
lrelu2 = tf.maximum(alpha * bn2, bn2)
drop2 = tf.layers.dropout(lrelu2, keep_prob, training=is_train)

# Output layer, 28x28xn
logits = tf.layers.conv2d_transpose(drop2, out_channel_dim, 5, 2, 'same')

out = tf.tanh(logits)

return out
tests.test_generator(generator, tf)

The generator architecture has first dense layer and fully connected layer after that deconvolutional layer (every layer contains batch_normalization, leaky relu and dropout layer except output layer). generator takes a random noise vector z after that it is reshaped to the 4D shape and pass it to the series of upsampling layers. each upsampling layer reprsents transpose convolution operation i.e Deconvolution operation.

All transpose convolutions with depths reducing from 1024 all the way down to 3 —which represents an RGB color image. The final layer outputs a 28x28x3 tensor through the Hyperbolic Tangent (tanh) function.

Architecture of Discriminator:-

def discriminator(images, reuse=False, alpha=0.2, keep_prob=0.5):

with tf.variable_scope('discriminator', reuse=reuse):
# Input layer is 28x28xn
# Convolutional layer, 14x14x64
conv1 = tf.layers.conv2d(images, 64, 5, 2, padding='same', kernel_initializer=tf.contrib.layers.xavier_initializer())
lrelu1 = tf.maximum(alpha * conv1, conv1)
drop1 = tf.layers.dropout(lrelu1, keep_prob)

# Strided convolutional layer, 7x7x128
conv2 = tf.layers.conv2d(drop1, 128, 5, 2, 'same', use_bias=False)
bn2 = tf.layers.batch_normalization(conv2)
lrelu2 = tf.maximum(alpha * bn2, bn2)
drop2 = tf.layers.dropout(lrelu2, keep_prob)

# Strided convolutional layer, 4x4x256
conv3 = tf.layers.conv2d(drop2, 256, 5, 2, 'same', use_bias=False)
bn3 = tf.layers.batch_normalization(conv3)
lrelu3 = tf.maximum(alpha * bn3, bn3)
drop3 = tf.layers.dropout(lrelu3, keep_prob)

# fully connected
flat = tf.reshape(drop3, (-1, 4*4*256))
logits = tf.layers.dense(flat, 1)
out = tf.sigmoid(logits)

return out, logits
tests.test_discriminator(discriminator, tf)

The work of Discriminator is to identify which image is real and which is fake. discriminator is also the 4 layer CNN with batch normalization and leaky relu layer(except in input layer). the discriminator receives the output image (which is of size 28*28*3) and perform convolutions on it. at last discriminator shows the output probabilities for showing whether image is real or fake using Logistic Sigmoid Function .

When discriminator sees the differences in the image it sends the gradient signal to the Generator and this signal is flows from discriminator to the generator.

4).Generator Loss and Discriminator Loss:-

Discriminator is receiving the images from both i.e training images and generator ,so while calculating discriminator’s loss we have to add loss due of real images and also due of fake images both networks are trained simultaneously so we need two optimizers for both generator and discriminator both. we want from discriminator to output the probabilities close to 1 if the images are real and close to 0 if the images are fake.

def model_loss(input_real, input_z, out_channel_dim, alpha=0.2, smooth_factor=0.1):

# TODO: Implement Function
d_model_real, d_logits_real = discriminator(input_real, alpha=alpha)

d_loss_real = tf.reduce_mean(
tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logits_real,
labels=tf.ones_like(d_model_real) * (1 - smooth_factor)))

input_fake = generator(input_z, out_channel_dim, alpha=alpha)
d_model_fake, d_logits_fake = discriminator(input_fake, reuse=True, alpha=alpha)

d_loss_fake = tf.reduce_mean(
tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logits_fake, labels=tf.zeros_like(d_model_fake)))

g_loss = tf.reduce_mean(
tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logits_fake, labels=tf.ones_like(d_model_fake)))
return d_loss_real + d_loss_fake, g_losstests.test_model_loss(model_loss)

Training and Results:-

when the training process is going on generator produces the set of images and after every epoch it gets better and bettter so that the discriminator couldn’t identify whether it is real image or fake image. results produces are as follows-

Images Generated:-

After first Epoch and After Second Epoch
After third and fourth Epoch
after fifth and sixth epoch

and so on the new faces are continously generated….

I have also got the pre-trained network from here and if you want to run the GAN using this pre-trained networks then use this python file which i am providing here.

after running this this will generate the set of 10 fake images. some of them are-

--

--

Shubham Sharma
Coinmonks

Software Engineer-2 @ Dell R&D Center, Bengaluru (Indian Institute of Information Technology & Management, Gwalior)