Deep Learning Class Project Journal — Day3

Patrick Mesana
3 min readApr 30, 2017

--

Early on, while I was working on deep autoencoders, I had an idea. Could we train the autoencoder with complete images then train with corrupted images? I tried it, didn’t work. The network was not stabilizing for weights associated with the pixels in the middle of the image, most of the time I was getting black centres. I gave up and went on with other inconclusive strategies until I realized what I was searching for is a…

GAN

More specifically, a Deep Convolutional Generative Adversarial Networks. This type of network, also invented at University of Montreal by Ian Goodfellow, makes two tasks compete against each other. The first task is generating images, fake images, and the second task is distinguishing fake from real images. They compete because training is divided in two phases, the first one consists of generating images to deceive the discriminator, the second phase consists of training the discriminator. This is repeated many many times.

With this brilliant model, I can build a generative task similar to my autoencoder, and a classification task trained with both fixed/fake images and complete/real images.

Implementation

Easier said than done, I decided to look for existing implementations and found this one. The code is broken and I spent time fixing it, but it gave me a head start.

I started by generating images from noise, just to get the grasp on GANs. Then, I moved to my original idea, I used my pre-trained convolution autoencoder as my generator. For every iteration, I train the discriminator on 5 batches (300 examples each) and the generator on 1 batch.

My model

Generator: A Convolution autoencoder with corrupted images as input. Learning rate of 1e-4 (Very important!)

Discriminator :

  • Layers : 5 Conv layers, 4 MaxPooling, 2 Dense
  • Activations : LeakyReLU and Softmax as output
  • Regularizer : Dropout (0.25)
  • Loss : Categorical crossentropy
  • Optimizer : Adam
  • Learning Rate : 1e-3

Results

I’ll save you the suspense, it was really hard to train this type of model. GANs converge slowly, balance between generator loss and discriminator loss is tricky, oscillating gradient …

Fortunately, I found a couple of good articles, especially this one, which guide me in reasonable directions. After many fails I managed to have a first version that seems to converge.

GAN v1 training
GAN v1 predicted images

Reading more about GANs, I found inpainting projects (this one in particular) close to our task. I am realizing I am not training the network properly. My intuition was to pretrain the generative autoencoder and add adversarial loss to improve my blurry results. The problem with this logic is well explained in this paper, the corrupted data is not drawn from the distribution of the dataset, so convergence in my case does not mean improvement.

Project Conclusion

I did not manage to implement a GAN in Keras that gives me qualitatively better results. I tried many things, no pretraining on discriminator (and generator), adding noise, adding L2 generative loss to fake discrimator loss, generating only the middle part and merge it with cropped image…

The main reason I think it didn’t work was my understanding on the theory behind, but I learned a lot since. GANs are not for beginners, DON’T GUESS ANYTHING! I also feel I pushed Keras to its limit, you can’t train the discriminator and generator without recompiling, CPU is used more than it should be and it’s a big hit on performance. In addition to that, BatchNorm in Keras/Theano is extremely slow. It’s still a great library, especially to build complicated graphs, but I don’t recommend it for GANs (at least for the moment).

The class project is now over. Since, I found an article very close to what I was trying to achieve, and they manage to make it work on celebA which contains only faces and much less image variance.

--

--