Cognitive Computing Final Project:Image Generation with Neural Networks

bds mis
4 min readMar 16, 2019

--

Team members: Ting-Yi Hung, Yuejia Feng, Junyan Shao, Shuai Ye

Background

In 2014, a research paper of Generative Adversarial Networks (GANs) came out. It was regarded as a breakthrough of generative models. Today, GANs is one of the most popular architecture in the field of picture generation. Our team came across GANs while researching for picture generative models, and we decided to use DCGAN (Deep Convolutional GAN) in our project. Additionally, Tensorflow was used to train and test our model.

Data Source

In this project of image generalization, we used two datasets to be trained with the model separately. One dataset consists of 4000 pictures of dogs, another has 3700 pictures of chickens. Both of the datasets were downloaded from Kaggle dataset.

Data Preparation

1)Remove Outliers

For data preparation, we first removed the outliers, for example, the corrupted image files or the images that it’s impossible to tell it’s a dog or chicken. Because these two datasets both have images with different resolutions, we also need to remove some images of which the sizes are too small to be resized.

2)Resize the images

After removing the outliers, we continued to resize the images to 128✖️128 as the input of our neural network.

Model Introduction

We used DCGAN (Deep Convolutional GAN) as our architecture to produce dog and chicken images separately. In GAN (Generative Adversarial Networks), we modeled a generator that sampled s~p(s) from the latent variable z and generated a picture. Besides, the picture was fed into a discriminator which was created to make the identification between the samples from the generator and the true samples from training data. That is, we were going to create two models (the generator and the discriminator) and matched the distributions of the samples from true distribution and the samples from the generator. In particular, we would minimize the difference between the two distributions. In the meanwhile, we applied CNN (Convolutional Neural Network) in GAN to improve learning and that was the name “DCGAN” came from.

The general process of the models:

The generator took the random noise vector z as an input and produced a fake image. At the same time, the discriminator took the training dataset (real images) and the fake images and compared the difference between them. Then, the discriminator would output a number either 0 for fake image or 1 for real image.

Image resource: LINE Stickers — Shiba Inu (Shiba-Dog) stickers

Discriminator model:

  • Input: A real or fake image
  • Output: A score
  • Explanation: In this model, we doubled the filter size every strided convolution layer. At each layer, we applied batch normalization except for the input layer to lessen the covariance shift. Leaky ReLU was used as the activation function to help avoid the vanishing gradient effect.
Discriminator

Generator model:

  • Input: A random noise vector z
  • Output: A fake image
  • Explanation: We halved the filter size and double the size of the picture every transposed convolution layer. We kept others the same as the discriminator, such as batch normalization and Leaky ReLU in every transposed convolution layer except for the input layer. Another difference between discriminator and generator was that we applied Tanh at the end of the model as its output activation function.
Generator

Results of Model Training From Scratch

Model Training for chickens:

Model Training for dogs:

Conclusion

We trained 100 epochs for the neural net model and tried different learning rates. The best learning rate for this model is 0.00005, it has a significantly better performance than learning rate 0.05. Comparing to the cat image training process below which is trained by the original author with a much more powerful GPU, our model has a more vague picture in the end. We did not use any pre-trained model, so the end result is reasonable as we do not have a strong enough computing power to train a larger dataset and more epochs.

The cat images trained by original author Thomas Simonini

Future Work

We should search for more powerful computing power, and with more pictures trained (>10000), and more epochs are used during training, the result is believed to become better. Also, we should try bigger size images so that we can get a better resolution and we can try face center to trace the change of the face during modeling process.

Github link:

https://github.com/jyshao1/CognitiveComputingProject

Dataset resources:

https://www.kaggle.com/alessiocorrado99/animals10

https://www.kaggle.com/tongpython/cat-and-dog

References:

[1]https://medium.freecodecamp.org/how-ai-can-learn-to-generate-pictures-of-cats-ba692cb6eae4

[2]https://ajolicoeur.wordpress.com/cats/

[3]https://medium.com/@keisukeumezawa/dcgan-generate-the-images-with-deep-convolutinal-gan-55edf947c34b

--

--