StyleGAN | A Style-Based Generator Architecture for Generative Adversarial Networks

Florian Debrauwer
To cut a long paper short
3 min readJul 20, 2022

Problem. SyleGAN is about understanding (and controlling) the image synthesis process in the generator of convolutional GANs. More specifically, this paper aims to improve the distribution quality, interpolation properties, and latent space entanglement.

Approach. Inspired by the style transfer literature, the authors propose some changes to the generator’s architecture. Instead of directly providing the latent code (Z) to the generator, a feed-forward network project and disentangle the input into an intermediate latent space (W). An affine transformation can be computed from W, to directly control the adaptive instance normalization (AdaIN) after each convolution. Thus, W will be encouraged, through the affine transform parameters, to specialize in different styles. Note that the generator’s input is a learned constant tensor. Finally, additional Gaussian noise is added to each feature map to ease the generation of stochastic details.

Take away. The intermediate latent space W does not have to support sampling according to any fixed distribution (while Z does), since its sampling density is induced by the learned mapping. While learning, the generator has an incentive to linearize the factor of variation in Z, as it is easier to generate realistic images based on disentangled factor representation. When adding the mapping network and AdaIN, the authors observed no benefit from feeding the latent code as input to the generator. They simplified the architecture by giving a constant (learned) tensor as input. An important consequence of these additions is that it enables high and low-level control over the styles, as the effect of each style is local (specific to a convolution operation). The mapping network and affine transformations can be seen as a way to draw samples for each style from a learned distribution. The generator simply generates novel images based on a collection of styles, provided at each layer. To further encourage style to localize, the authors use style mixing, as a regularization technique. It consists of providing two different Z vectors during training, to prevent correlation from adjacent styles. Finally, given the constant input, the network would need to learn to generate spatial stochastic details from earlier activations. This consume network capacity and is not always successful. Thus, per-pixel noise is added after each layer. Interestingly, this noise does not affect the overall style, which means that it is encoded reliably by spatially invariant statistics such as mean, variance, …

Conclusion. Style-based generators are greatly superior to the traditional GAN’s generator architecture for image generation. Indeed, the proposed style-based network improves FIDs over various generators, almost 20% over ProgressiveGAN (baseline).

Sources. Check out the [code] and [paper] for more details. All figures come from the paper.

--

--