Deep Learning: Zero to One — “Image Generation”

4 min readMar 7, 2017

With this letter, (if you want) you can replicate what I did: Conditional-PixelCNN-decoder.

https://itunes.apple.com/us/podcast/deep-learning-zero-to-one/id1212062401?mt=2

In this newsletter, I talk through generating an image of IRS tax return characters using a model trained on the IRS tax return dataset — NMIST. The authors trained for 70 hours on 32 GPUs. I used unconditioned image generation to create an image in 6 hours on my MacBook Pro CPU. I used the TensorFlow implementation of Conditional Image Generation with PixelCNN Decoders (https://arxiv.org/abs/1606.05328) by a student named Anant Gupta and learned that reasonable-looking digits can be generated with significantly fewer training steps, as soon as the training loss approaches that reached by the DeepMind authors.

The purpose of this code is to generate images of handwritten digits. Understanding image generation, and handwriting in particularly, is a deep step toward verifying that documents created by hand are in fact created by hand and not by a computer. The paper providing the impetus for this TensorFlow implementation (published within the past 8 months) is from DeepMind.

I replicated the unconditioned image generation, trained on handwritten digits, on a CPU.

Here we go:

First, I ran the unconditioned generation script for 12 hours on my MacBook CPU.

Starting at midnight, and stopping it at noon, it only made it 8 epochs. (The authors did a full 70 epochs.) This is going to take ~4 days on my CPU.

But, I noticed my cost after only 8 epochs (0.107695) is very close to the authors’ 0.104610, which they achieved after 70 epochs. This means, I can use many fewer epochs and get reasonably comparable results. Of course, when I stopped the script, no image was generated. So, I tried the epochs=0 flag. (I want to get an image I can look at, as quickly as possible, to test that I am making progress.)

With 0 epochs, I’m not expecting a nice image, but I thought it would be generated immediately. Instead, 30 mins later it remains at this command line status, ‘Generating Sample Images’:

After 45 minutes, the flag, epoch=0, does generate the image. It looks like salt and pepper noise. OK. This is expected. If this were a generative adversarial network (trying to generate a picture of a cat, or something), I would expect this to be the first image it generates too.

I now set epoch=5, so it should take a little less than the 12 hours it took to get to epoch 8, but this time it will generate an image. I let it be.

The computer did become unplugged from the power supply and so it turned off. However, to my surprise, this is acceptable with Terminal (bash) and this script, after 30 seconds, resumed its computations. Pressing esc (by mistake) created the esc character that you can see on the last line of the below image.

It’s working. It made it to Epoch 1.

Now, it got through 4 epochs in 6 hours. And it’s generating the image

And here is what I get (on the right): this is unconditional image generation of NMIST handwritten digits after 5 epochs on a MacBook Pro. The generated digits are imperfect. They are clearly handwritten digits, and this is the intended result, but they will not pass the Turing test. The authors’ digits (on the left), using 32 GPUs for 70 epochs are quite good.