Image upscaling with SRGAN

Sridhar G Kumar
5 min readAug 2, 2022

--

A web-app named ‘Pixel Perfect’ was designed and implemented to upscale the resolution of an uploaded image by 4x.

This project was implemented by a team of four people (Me, Melissa Siddle, and 2 others) over a period of two weeks.

Image upscaling finds its uses in several fields, be it in improving old material media, enhancing microscope images in the scientific community, or even to improve the resolution of a CCTV footage. The limitations of its application is truly only limited by our imagination. This article aims to take a look at how this was implemented.

The Model

The deep-learning model that we used for our task was a Super-Resolution Generative Adversarial Network (SRGAN) model with ResNet blocks as generator and a dense network of CNN layers as our discriminator. The specific blocks are shown in the figure below with corresponding kernel size (k), number of feature maps (n) and stride (s) indicated for each convolutional layer.

Architecture of Generator and Discriminator Network

The overall loss function of our SRGAN model employs a perceptual loss function. The perceptual loss is a combination of content loss and adversarial loss taken at a ratio of 1:1/1000 respectively. The content loss is calculated using the VGG19 model and is defined as the as the euclidean distance between the feature representations of a reconstructed image and the reference image. The adversarial loss is calculated based on the probabilities of the generated image being a natural HR image.

But what does that mean? To simplify it, we use the analogy of an art forger (generator) and a detective(discriminator). How does the detective measure whether the art piece created by the generator is real or not?For each image that our generator creates, the discriminator tries to identify if the forgery is real or fake (adversarial loss), and it continues to do this until our art thief can successfully fool the detective. This is in essence the principal behind the SRGAN model.

The Dataset

This SRGAN model has been trained on the DIV2K dataset. This dataset has been used widely to train super-resolution models. It is made up of 1000 HR images with a wide range of subjects. The images have been downscaled (reduced in quality/detail) to then create the low-resolution versions. Therefore, each image has the target (original HR image) and the resulting bicubically downsampled LR image that it starts off with.

Top right is the HR image, and then the resulting downsampled images using 2, 3 and 4 downscaling factors

Now, the SRGAN model uses these images as the training and validation set (8:2) to learn how to get from the LR image to the HR image by reducing the perceptual losses to its minimum for the validation set. Once the model has been trained, we can give provide a new LR image outside of the DIV2K, and the trained model can upscale it. The outcome is what we call a super-resolution image. Don’t be fooled though — the super-resolution image is not the reality of what your image would look like at a higher resolution, it is our model’s best attempt at what it might look like.

The Auto-Encoder

In theory, we can stop at that, our algorithm already does a great job! But we decided to take it further and add an auto-encoder.

Auto-encoders can be used widely for many purposes, in this specific case, we use it to remove the noise or distortions in images to give them a more realistic look.

It has two parts:

  • The Encoder: The encoder takes the input and compresses it
  • The Decoder: The decoder takes the compressed input and then reconstructs it into its original shape

The Split-Merge

And why would we stop there? To ensure our trained model can process your images irrespective of the resolution of your input image, we also decided to split the images. We quite literally separate the image into different tiles/blocks, so that each tile can be individually upscaled by the model, and then it will be merged back together. The main purpose of this is simply to reduce the processing time since upscaling a series of lower resolution images individually is less computationally heavy as opposed to upscaling a single large image directly. The splitting-merging process is fully automated and parametric and doesn’t need any additional inputs from the user.

The Results

Since, the entire DIV2K dataset was used for the purpose of training and validation we used different datasets such as Set5, General100, BSD200 etc. for the purpose of testing our SRGAN model. A few of our tests and their results are listed below:

The App

The web-app was built using streamlit and deployed on Google cloud run as a CI/CD pipeline from using GCP services, namely cloud build and container registry. We welcome you to use our app and upscale your images as well! We always welcome your feedback.

https://srgan-upscale-2xv4a765yq-ey.a.run.app

Additionally, you can find our project repository here:

https://github.com/sridhar211/SRGANupscaling

References

Our work is based on the following paper:

https://arxiv.org/pdf/1609.04802.pdf

--

--