The unprecedented effectiveness of “Progressive Growing of GANs”

Figure 1. Training time lapse of progressively growing GAN obtained on LFW dataset


This blog is a succinct report of my experience of working with the techniques for training Generative Adversarial Networks (GANs) as described in the 2018 ICLR paper “Progressive Growing of GANs for improved quality, stability and variation”. My main goal for writing this blog is to highlight the underrated / unmentioned contributions from the paper too.

I have created an opensource Python package for the ProGAN architecture as an extension of the PyTorch’s Module abstraction. you can find the package at and the repository for the project on my github handle at

The code for trained examples using this package can be found at another repo of mine:, from which the training-time-lapse of figure 1 is created.

Progressive Growing:

This is sort-of the flagship contribution of the research. The idea here is quite simplistic in nature, but careful thinking reveals the intricate behaviour of it. The architectures of the GAN’s generator G and discriminator D are mirror images of each other, so they can be layerwise trained in a synchronous manner. Apart from the layerwise training, the notable feature is the fading-in of new layer to the ongoing training. The resolution of the images used for training is also synchronously increased (fade-in effect in reverse manner) in sync with the fade-in of the new layer. I like to describe this process as “Not letting the Learning of the Generator loosen out”. If you are able to visualise the process, you can easily make out why I say that the entire training remains tightly bound to the task at hand.

Minibatch stddev layer:

A parameter free alternative to the Minibatch discrimination layer which works quite effectively. The idea is so simple that it can be stated in one sentence: ‘append the std-dev of the minibatch to the input as an additional constant feature map in the discriminator’. This ensures that the std-dev -> variance of the generated samples is similar to the real samples.

Equalized Learning rate:

This technique is one of my favourite. It is not very easily conveyable, so I would urge the readers to read the paper for it. But I’ll still try to explain it in brief. According to this technique, All the trainable weights of the Network are initialised according to the Standard Normal distribution N(0, 1), but are scaled layerwise according to the He’s initializer constant during training at runtime. This can be viewed as adjusting the scale of the learning rate according to the variance of every single distinct trainable parameter.

Pixelwise Feature Vector Normalization in Generator:

This is a constraint applied to the layers in the Generator in order to restrict the exploding values nature of training due to the competition between G and D. This layer normalizes the feature vector in each pixel to unit length in the generator after each convolutional layer.

My thoughts:

I believe that the other above mentioned contributions of the paper are equally important as the progressive growing technique. I say this from experience, because the first version that I had coded did not use any of the other techniques and due to which, I was unable to obtain the expected results. According to me, this is a phenomenal piece of research and will definitely take us towards to the ultimate goal of “Stably training GANs”. This also in way tells us more about the nature of our brain: perhaps, our brain trains itself progressively, one step at a time? Who knows!

Meanwhile, I encountered two more papers: “Self Attention GANs” and “Relativistic GANs”, which I will be working on to add to the package.

ProGAN has definitely opened new doors for us and we need to leverage these contributions for our work thoughtfully. I encourage you all to peruse the package I created and use it for your tasks as well. Please feel free to open any issues or contribute to the code for everyone’s benefit.