Understanding the StyleGAN and StyleGAN2 Architecture
The article contains the introduction of StyleGAN and StyleGAN2 architecture which will give you an idea. It may help you to start with StyleGAN. You will find some metric or the operations name which you don’t know, to gain a deep understanding of StyleGAN and StyleGAN2 you can go through the paper whose link is provided in the resources section.
Let’s start with the StyleGAN and then we move towards StyleGAN 2.
StyleGAN
The major changes they have done in the Generator part of the “Progressive Growing of GANs” architecture. Below you can see both the traditional and the style-based generator (new one or StyleGAN network) network.
In the traditional network, latent vectors directly pass into the block just after the normalization whereas in the StyleGAN network latent vectors after normalization pass through the mapping network (layer of 8 fully connected networks) then the outputs are transformed (A stands for the affine transformation which is the combination of linear transformation and translation) and passed to the blocks and get added with the noise B after the instance normalization (AdaIN i.e Adaptive instance normalization).
Above, you can see the formula of AdaIN where x comes from the conv net and y comes from the left side network. Clearly, seen in the equation that after the normalization of x, y(s, i) is used for the scaling and y(b, i) is used for the transformation as a bias. Below you can see the StyleGAN in a simple form.
In the official paper, you will see the results on CelebA and FF(Flickr Faces) high-quality datasets where they shown the FIDs (Frechet inception distances) score using 50K randomly drawn images from the training set. Below you can see the results
They started from baseline configuration A (Progressive GAN), and after adding bilinear up/downsampling, long training they see improvements. Then added mapping network and AdaIN operations or in config D they removed the traditional inputs from the synthesis network and replaced them with 4x4x512 constant tensor.
We can see the improvements in FIDs value over the traditional generator (B) and enabling Mixing regularization (this operation also called style mixing) gives more control over the style and high-level aspects like pose, hairstyle, eyeglasses, etc.
So, this is a simple introduction to the StyleGAN architecture and now let’s see what improvements have been made in StyleGAN 2 and understand its architecture.
StyleGAN 2
In the below image, you can see the defects or blurry portion in the generated image which comes from the starting 64x64 resolution. This is the major reason behind the redesigning of the generator with that the quality of generated images also improved.
So, let’s see what changes in the architecture of the network improves the performance of generated image step by step. Below you can see the improvements in architecture
Part A is the same StyleGAN architecture and Part B shows the detailed view of the StyleGAN architecture. In Part C, they replaced the AdaIN (Adaptive Instance Normalization) with the Modulation (or the scaling of the factors) and Normalization. Below you can see the modulation (left side) and normalization (right side) equation.
Also, in Part C they shifted the addition of noise and bias outside of the block. Finally, in Part D you can see the weights are adjusted with the style and the normalization is replaced with a “demodulation” operation, combined operations are called “Weight Demodulation”. See the formula below.
You can see that this equation seems the combination of the above two modulation and normalization equations (epsilon is a small constant value, used to prevent numerical issues like division by zero). Results can be seen on the below outputs, after replacing the normalization with demodulation removes the droplet-like artifacts.
Now, we have seen the improvements in the form of generated images. Let’s see the improvements measured using metrics like FID, Perceptual path length (Introduced in the StyleGAN paper, lower the PPL better the generated image), etc.
On the above table, can be seen that the improvements on the configurations after applying different methods. Path length regularization and Lazy regularization are used to keep the PPL score low show that the generated images are more clear or smooth.
Progressive growing of the network generates high-quality images but it also causes the characteristic artifacts (or phase artifacts) i.e. eye and teeth of the person seems stuck at one place wherever the face of the person moves, it was shown in the official StyleGAN2 video (video link is attached in the resources section below).
To solve this issue they tried other connections (skip connections, residual nets, etc) on the generator and discriminator network and saw that skip connection work best for the generator and residual nets give better results on the discriminator.
Above table shows the results with the combination of each type of connection. Also, on the above main result table configuration E and F shows results on the skip connection type generator and residual discriminator network.
In the below figure, highlighted part in b is a generator and in c is a discriminator without progressive growing.
Almost all the images are taken from the official paper of StyleGAN and StyleGAN 2 whose links are given below in the resource section.
Resources:
- Synthesizing High-Resolution Images with StyleGAN 2 — YouTube
- A Style-Based Generator Architecture for Generative Adversarial Networks — YouTube
- Analyzing and Improving the Image Quality of StyleGAN (StyleGAN 2)
- A Style-Based Generator Architecture for Generative Adversarial Networks (StyleGAN)
- NVlabs/stylegan2: StyleGAN2 — Official TensorFlow Implementation (github.com)
- NVlabs/stylegan: StyleGAN — Official TensorFlow Implementation (github.com)