ESRGAN : Enhanced Super Resolution GAN

Published in

Analytics Vidhya

4 min readMay 25, 2020

Adding one more to the group of Super Resolution in Computer Vision (previous implementation — SRGAN), this article is the PyTorch implementation of ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks.

As the name depicts, this is an enhanced version of previous SRGAN implementation. The overall high-level architecture design of the network is retained but few new concepts are added and changed which ultimately lead to increase in efficiency of the network.

Quoting directly from the paper,

To further enhance the visual quality, we thoroughly study three key components of SRGAN — network architecture, adversarial loss and perceptual loss, and improve each of them to derive an Enhanced SRGAN (ESRGAN).

Let us discuss all these three improvement of components of SRGAN :

Network Architecture

The network structure of Generator is improved by introducing the Residual-in-Residual Dense Block (RRDB), which increase the capacity of the network and makes the training easier too.

Above is the architecture of the Generator where basic block is actually RRDB.

To enhance the quality of generated image of SRGAN, mainly two modifications are made in network architecture :

Removal of all Batch Normalization (BN) layers
Replacing the original basic block with the RRDB

In the above left image, it can be seen that BN layer is removed and in this right image, RRDB block is used in deeper model where β is the residual scaling parameter.

It has been observed that removal of BN layers leads to increase in performance and reduction of computation complexity and memory usage in many network architecture. Meanwhile RRDB results in more deeper and complex structure for the Generator Network than the original residual block in SRGAN which ultimately boosted the performance of the network. The residual scaling parameter is kept constant in between 0 and 1 to prevent the instability of the network.

Below is the implementation of a RRDB block in the PyTorch framework :

In the current implementation of ESRGAN, 23 such RRDB blocks are used in the Generator network.

Adversial Loss

The second enhancement made is the improving the discriminator using the concept of Relativistic average GAN (RaGAN) which makes the discriminator to judge “whether one image is more realistic than the other” rather than “whether one image is real or fake”.

Above is the difference between standard discriminator and relativistic discriminator. Instead of the standard discriminator which gives the probability that an image is real or fake, relativistic discriminator tries of predict the probability that real image relatively more realistic than fake image.

Below is the implementation of the relativistic discriminator in the PyTorch :

Perceptual Loss

The perceptual loss is introduced in super-resolution to optimize super-resolution model in feature space instead of pixel space. The perceptual loss is improved in the ESRGAN by using the features before activation which could lead to brightness consistency and texture recovery. The perceptual loss is implemented by using VGG features before activation instead of after activation as in SRGAN.

The features before activation are used in ESRGAN because of following two reasons :

The activated features are very sparse which provides weak supervision and thus leads to inferior performance.
The activated features causes inconsistent brightness in comparison to ground truth image

Training Details

Similar to SRGAN, ESRGAN also scales the Low Resolution(LR) image to High Resolution(HR) image from 64 x 64 to 256 x 256 with up-scaling factor of 4.

Total loss for the Generator is calculated as :

Source

where L1 is the content loss, Lpercep is the perceptual loss and other is the relativistic generator loss . λ, η are the coefficients to balance different loss terms. λ and η are set to 0.005 and 0.01 respectively in the training.

For optimization, Adam optimizer is used with learning rate of 0.0002 with β1 = 0.9 and β2 = 0.999 which are the default values.

Reference

ESRGAN Paper

Implementation

Github

For more reference to the overall architecture, kindly refer to the SRGAN article.

ESRGAN : Enhanced Super Resolution GAN

Reference

Implementation

Written by Vishal Sinha