Analytics Vidhya
Published in

Analytics Vidhya

ESRGAN : Enhanced Super Resolution GAN

Source

Adding one more to the group of Super Resolution in Computer Vision (previous implementation — SRGAN), this article is the PyTorch implementation of ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks.

As the name depicts, this is an enhanced version of previous SRGAN implementation. The overall high-level architecture design of the network is retained but few new concepts are added and changed which ultimately lead to increase in efficiency of the network.

Quoting directly from the paper,

To further enhance the visual quality, we thoroughly study three key components of SRGAN — network architecture, adversarial loss and perceptual loss, and improve each of them to derive an Enhanced SRGAN (ESRGAN).

Let us discuss all these three improvement of components of SRGAN :

Network Architecture

The network structure of Generator is improved by introducing the Residual-in-Residual Dense Block (RRDB), which increase the capacity of the network and makes the training easier too.

Source

Above is the architecture of the Generator where basic block is actually RRDB.

To enhance the quality of generated image of SRGAN, mainly two modifications are made in network architecture :

  1. Removal of all Batch Normalization (BN) layers
  2. Replacing the original basic block with the RRDB
Source

In the above left image, it can be seen that BN layer is removed and in this right image, RRDB block is used in deeper model where β is the residual scaling parameter.

It has been observed that removal of BN layers leads to increase in performance and reduction of computation complexity and memory usage in many network architecture. Meanwhile RRDB results in more deeper and complex structure for the Generator Network than the original residual block in SRGAN which ultimately boosted the performance of the network. The residual scaling parameter is kept constant in between 0 and 1 to prevent the instability of the network.

Below is the implementation of a RRDB block in the PyTorch framework :

Source

In the current implementation of ESRGAN, 23 such RRDB blocks are used in the Generator network.

Adversial Loss

The second enhancement made is the improving the discriminator using the concept of Relativistic average GAN (RaGAN) which makes the discriminator to judge “whether one image is more realistic than the other” rather than “whether one image is real or fake”.

Source

Above is the difference between standard discriminator and relativistic discriminator. Instead of the standard discriminator which gives the probability that an image is real or fake, relativistic discriminator tries of predict the probability that real image relatively more realistic than fake image.

Below is the implementation of the relativistic discriminator in the PyTorch :

Source

Perceptual Loss

The perceptual loss is introduced in super-resolution to optimize super-resolution model in feature space instead of pixel space. The perceptual loss is improved in the ESRGAN by using the features before activation which could lead to brightness consistency and texture recovery. The perceptual loss is implemented by using VGG features before activation instead of after activation as in SRGAN.

The features before activation are used in ESRGAN because of following two reasons :

  1. The activated features are very sparse which provides weak supervision and thus leads to inferior performance.
  2. The activated features causes inconsistent brightness in comparison to ground truth image

Training Details

Similar to SRGAN, ESRGAN also scales the Low Resolution(LR) image to High Resolution(HR) image from 64 x 64 to 256 x 256 with up-scaling factor of 4.

Total loss for the Generator is calculated as :

Source

where L1 is the content loss, Lpercep is the perceptual loss and other is the relativistic generator loss . λ, η are the coefficients to balance different loss terms. λ and η are set to 0.005 and 0.01 respectively in the training.

For optimization, Adam optimizer is used with learning rate of 0.0002 with β1 = 0.9 and β2 = 0.999 which are the default values.

Reference

ESRGAN Paper

Implementation

Github

For more reference to the overall architecture, kindly refer to the SRGAN article.

--

--

--

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Recommended from Medium

Dog Breed Classifier

Understanding PatchGAN

Simplified In-Depth Guide to Perceptron from DL Practitioners

Machine Learning Pipelines: Everything You Need to Know

Machine Learning Pipelines

Depth estimation with deep neural networks part 2

Language modelling using Recurrent Neural Networks - Part 2

YOLOv2 Object Detection: Deploy Trained Neural Networks to NVIDIA Embedded GPUs

Beginners Guide to Federated Learning.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Vishal Sinha

Vishal Sinha

Deep Learning and Machine Learning Enthusiast. Writer at Medium and Analytics Vidhya

More from Medium

Career Options In Embedded System

CS371p Spring 2022: Gautham Raju

What’s Driving Increased Activity in Secondary Markets for Private Company Stock in 2022