Why I Stopped Using GAN — ECCV 2020

Computer Vision Zurich
The Startup
Published in
6 min readAug 17, 2020

--

GAN — vs — Normalizing Flow

The benefits of Normalizing Flow. In this article, we show how we outperformed GAN with Normalizing Flow. We do that based on the application super-resolution. There we describe SRFlow, a super-resolution method that outperforms state-of-the-art GAN approaches. We explain it in detail in our ECCV 2020 paper.

Intuition for Conditional Normalizing Flow. We train a Normalizing Flow model to transform an image into a gaussian latent space. During inference, we sample a random gaussian vector to generate an image. That works because the mapping is bijective and therefore outputs an image for a gaussian vector. SRFlow extends this method to use an image as conditional input.

ECCV 2020 Spotlight [Paper]

Code Released[Github]

30 seconds introduction [Paper]

Following tweet contains:

  • The summary video of 50 seconds.
  • The link to read this article for free.

[Share on Twitter]

Advantages of Conditional Normalizing Flow over GAN approaches

  • Sampling: SRFlow outputs many different images for a single input.
  • Stable Training: SRFlow has much fewer hyperparameters than GAN approaches, and we did not encounter training stability issues.
  • Convergence: While GANs cannot converge, conditional Normalizing Flows converge monotonic and stable.
  • Higher Consistency: When downsampling the super-resolution, one obtains almost the exact input.
Comparison of GAN based ProgFSR and Normalizing Flow based SRFlow.

A stochastic alternative to Conditional GANs

Designing conditional GANs that produce highly stochastic output, and thereby capture the full entropy of the conditional distributions they model, is an important question left open by the present work.

— Isola, Zhu, Zhou, Efros [CVPR 2017, pix2pix]

Random walk in latent space of SRFlow. As opposed to most GAN approaches, the Super-Resolutions are consistent with the input. (8x)

Since [the GAN generator] G is conditioned on the input frames X, there is variability in the input of the generator even in the absence of noise, so noise is not a necessity anymore. We trained the network with and without adding noise and did not observe any difference.

— Mathieu, Couprie, LeCun [ICLR 2016]

Conditional GANs ignore the Random Input. Initially, GAN was created to generate a diverse output. A change in the input vector changes the output to another realistic image. However, for image-to-image tasks, the groups of Efros and LeCun discovered that the generator widely ignores the random vector, as cited above. Therefore most GAN based image-to-image mappings are deterministic.

SRFlow shows a large variety of 16x super-resolutions.

Strategies for stochastic Conditional GANs. To make a conditional GAN for super-resolution stochastic, Bahat and Michaeli added a control signal to ESRGAN and discarded the reconstruction loss. Similarly, Menon et al. also consider the stochastic SR in the GAN setting and address the ill-posed nature of the problem.

How to generate a stochastic output using Normalizing Flow. During the training, Normalizing Flow learns to transform high-resolution images into a gaussian distribution. While the discriminator loss of GANs often causes mode collapse, we observed that this is not the case for image conditional Normalizing Flow. Therefore SRFlow is trained with a single loss and intrinsically samples a stochastic output.

SRFlow shows a large variety of 16x super-resolutions. (Same input as above)

Replace conditional GAN with a more stable method

Conditional GANs need careful hyperparameter tuning. As seen in many works that use conditional GAN, the loss comprises of a weighted sum of multiple losses. Zhu et al. developed CycleGAN and carefully tuned the weights of eight different losses. In addition, they had to balance the generator and discriminator strength to make the training stable.

Why Normalizing Flow is intrinsically stable. Normalizing Flow only has a single network and a single loss. Therefore it has much fewer hyperparameters, and it is easier to train. Especially for researchers that develop new models, this is very practical, as it makes it much easier to compare different architectures.

The single loss of SRFlow is stable and converges steadily.

No need for multiple losses in Normalizing Flows

A Minimax loss is very unstable to train. The loss in a GAN training comprises of a generator that tries to fake images, so the discriminator cannot know they are fake. And the discriminator tries to determine if an image comes from the generator or is from the training set. Those two conflicting targets cause a continuous drift for the learned parameters.

Normalizing Flow is trained with a single Loss. Normalizing Flow, and its image conditional version is simply trained using maximum likelihood. This is possible as the input images are transformed into a gaussian latent space. There we simply calculate the likelihood of the obtained Gaussian vector. Using an off-the-shelf Adam optimizer, this loss converges stably and steadily.

Towards more evidence-based models

Measurement of how faithful the super-resolution is to the input. [More quantitative results]

Without further intervention, conditional GANs are not input consistent. For super-resolution, an important question is if a super-resolved image is consistent with the low-resolution image. If this is not the case, it is questionable if the method actually does super-resolution or just image hallucination.

The Effect of Input Consistency

Why Normalizing Flow’s output is consistent with the input. While GANs have an unsupervised loss that encourages image hallucination, conditional Normalizing Flow lacks such an incentive. Its only task is to model the distribution of high-resolution images conditioned on an input image. As shown in our SRFlow paper, this provides an almost perfect consistency with the input image.

Further Visual comparison

Comparison of GAN based ESRGAN with Normalizing Flow-based SRFlow on 4x Super-Resolution.

More visuals and details in our ECCV 2020 paper

ECCV 2020 Spotlight: SRFlow: Learning the Super-Resolution Space with Normalizing Flow [Paper]

Use SRFlow in your next project

Our SRFlow ECCV 2020 paper revels:

  • How to train Conditional Normalizing Flow
    We designed an architecture that archives state-of-the-art super-resolution quality.
  • How to train Normalizing Flow on a single GPU
    We based our network on GLOW, which uses up to 40 GPUs to train for image generation. SRFlow only needs a single GPU for training conditional image generation.
  • How to use Normalizing Flow for image manipulation
    How to exploit the latent space for Normalizing Flow for controlled image manipulations
  • See many Visual Results
    Compare GAN vs Normalizing Flow yourself. We’ve included a lot of visuals results in our paper.

ECCV 2020 Spotlight [Paper]

Code Released [Github]

Tell others about GAN — vs — Normalizing Flow

Following tweet contains:

  • The summary video of 50 seconds.
  • The link to read this article for free.

[Share on Twitter]

--

--