An Introduction to Super Resolution using Deep Learning

An elaborate discussion on the various Components, Loss Functions and Metrics used for Super Resolution using Deep Learning.

Bharath Raj
Jul 1, 2019 · 11 min read

Written by Bharath Raj with feedback from Yoni Osin.

Introduction

Image for post
Image for post
A low resolution image kept besides its high resolution version. (Photo by Jarrad Horne on Unsplash)

Clearly, on applying a degradation function, we obtain the LR image from the HR image. But, can we do the inverse? In the ideal case, yes! If we know the exact degradation function, by applying its inverse to the LR image, we can recover the HR image.

But, there in lies the problem. We usually do not know the degradation function before hand. Directly estimating the inverse degradation function is an ill-posed problem. In spite of this, Deep Learning techniques have proven to be effective for Super Resolution.

This blog primarily focuses on providing an introduction to performing Super Resolution using Deep Learning by using Supervised training methods. Some important loss functions and metrics are also discussed. A lot of the content is derived from this literature review which the reader can refer to.

Supervised Methods

In this section, we group various deep learning approaches in the manner the convolution layers are organized. Before we move on to the groups, a primer on data preparation and types of convolutions is presented. Loss functions used to optimize the model are presented separately towards the end of this blog.

Preparing the Data

Image for post
Image for post
Degrading a high resolution image to obtain a low resolution version of it. (Photo by Jarrad Horne on Unsplash)

One important thing to note is that it is recommended to store the HR image in an uncompressed (or lossless compressed) format. This is to prevent degradation of the quality of the HR image due to lossy compression, which may give sub-optimal performance.

Types of Convolutions

Image for post
Image for post
Network design strategies. (Source)

The above image mentions a number of network design strategies. You can refer to this paper for more information. For a primer on the different types of convolutions commonly used in deep learning, you may refer to this blog.

Group 1 — Pre-Upsampling

Image for post
Image for post
A typical pre-upsampling network. (Source)

You can refer to page 5 of this paper for some models using this technique. The advantage is that since the upsampling is handled by traditional methods, the CNN only needs to learn how to refine the coarse image, which is simpler. Moreover, since we are not using transposed convolutions here, checkerboard artifacts maybe circumvented. However the downside is that the predefined upsampling methods may amplify noise and cause blurring.

Group 2— Post-Upsampling

Image for post
Image for post
A typical post-upsampling network. (Source)

The advantage of this method is that feature extraction is performed in the lower dimensional space (before upsampling) and hence the computational complexity is reduced. Furthermore, by using an learnable upsampling layer, the model can be trained end-to-end.

Group 3— Progressive Upsampling

Image for post
Image for post
A typical progressive-upsampling network. (Source)

By decomposing a difficult task into simpler tasks, the learning difficulty is greatly reduced and better performance can be obtained. Moreover, learning strategies like curriculum learning can be integrated to further reduce learning difficulty and improve final performance.

Group 4 — Iterative Up and Down Sampling

Image for post
Image for post
A typical iterative up-and-down sampling network. (Source)

The models under this framework can better mine the deep relations between the LR-HR image pairs and thus provide higher quality reconstruction results.

Loss Functions

Often, more than one loss function is used by weighting and summing up the errors obtained from each loss function individually. This enables the model to focus on aspects contributed by multiple loss functions simultaneously.

total_loss = weight_1 * loss_1 + weight_ 2 * loss_2 + weight_3 * loss_3

In this section we will explore some popular classes of loss functions used for training the models.

Pixel Loss

Image for post
Image for post
Plot of Smooth L1 Loss. (Source)

The PSNR metric (discussed below) is highly correlated with the pixel-wise difference, and hence minimizing the pixel loss directly maximizes the PSNR metric value (indicating good performance). However, pixel loss does not take into account the image quality and the model often outputs perceptually unsatisfying results (often lacking high frequency details).

Content Loss

Image for post
Image for post
Content loss between a ground truth image and a generated image. (Source)

The equation above calculates the content loss between a ground-truth image and a generated image, given a pre-trained network (Φ) and a layer (l) of this pre-trained network at which the loss is computed. This loss encourages the generated image to be perceptually similar to the ground-truth image. For this reason, it is also known as the Perceptual loss.

Texture Loss

Image for post
Image for post
Computing the Gram Matrix. (Source)

The correlation between the feature maps is represented by the Gram matrix (G), which is the inner product between the vectorized feature maps i and j on layer l (shown above). Once the Gram matrix is calculated for both images, calculating the texture loss is straight-forward, as shown below:

Image for post
Image for post
Computing the Texture Loss. (Source)

By using this loss, the model is motivated to create realistic textures and visually more satisfying results.

Total Variation Loss

Image for post
Image for post
Total Variation Loss used on a generated High Resolution image. (Source)

Here, i,j,k iterates over the height, width and channels respectively.

Adversarial Loss

Given a set of target samples, the Generator tries to produce samples that can fool the Discriminator into believing they are real. The Discriminator tries to resolve real (target) samples from fake (generated) samples. Using this iterative training approach, we eventually end up with a Generator that is really good at generating samples similar to the target samples. The following image shows the structure of a typical GAN.

Image for post
Image for post
GANs in action. (Source)

Advances to the basic GAN architecture were introduced for improved performance. For instance, Park et. al. used a feature-level discriminator to capture more meaningful potential attributes of real High Resolution images. You can checkout this blog for a more elaborate survey about the advances in GANs.

Typically, models trained with adversarial loss have better perceptual quality even though they might lose out on PSNR compared to those trained on pixel loss. One minor downside is that, the training process of GANs is a bit difficult and unstable. However, methods to stabilize GAN training are actively worked upon.

Metrics

Subjective metrics are based on the human observer’s perceptual evaluation whereas objective metrics are based on computational models that try to assess the image quality. Subjective metrics are often more “perceptually accurate”, however some of these metrics are inconvenient, time-consuming or expensive to compute. Another issue is that these two categories of metrics may not be consistent with each other. Hence, researchers often display results using metrics from both categories.

In this section, we will briefly explore a couple of the widely used metrics to evaluate the performance of our super resolution model.

PSNR

Image for post
Image for post
Calculation of PSNR. (Source)

In the above formula, L is the maximum possible pixel value (for 8-bit RGB images, it is 255). Unsurprisingly, since PSNR only cares about the difference between the pixel values, it does not represent perceptual quality that well.

SSIM

Image for post
Image for post
SSIM is a weighted product of comparisons as described above. (Source)

In the above formula, alpha, beta and gamma are the weights of the luminance, contrast and structure comparison functions respectively. The commonly used representation of the SSIM formula is as shown below:

Image for post
Image for post
Commonly used representation of the SSIM formula. (Source)

In the above formula μ(I)represents the mean of a particular image, σ(I) represents the standard deviation of a particular image,σ(I,I’)represents the covariance between two images, and C1, C2 are constants set for avoiding instability. For brevity, the significance of the terms and the exact derivation is not explained in this blog and the interested reader can checkout Section 2.3.2 in this paper.

Due to the possible unevenly distribution of image statistical features or distortions, assessing image quality locally is more reliable than applying it globally. Mean SSIM (MSSIM), which splits the image into multiple windows and averages the SSIM obtained at each window, is one such method of assessing quality locally.

In any case, since SSIM evaluates the reconstruction quality from the perspective of the Human Visual System, it better meets the requirements of the perceptual assessment.

Other IQA Scores

  • Mean Opinion Score (MOS)
  • Task-based Evaluation
  • Information Fidelity Criterion (IFC)
  • Visual Information Fidelity (VIF)

Conclusion

BeyondMinds

Reinventing Enterprise AI

Bharath Raj

Written by

Exploring Computer Vision and Machine Learning | https://thatbrguy.github.io

BeyondMinds

BeyondMinds leads the way to hyper-customized AI products for enterprises. Based on BeyondMinds Modular Engine (‘BME’), we solve the inherent market challenge of working with unstructured data, facilitating self-supervised learning, maintaining and re-training of models

Bharath Raj

Written by

Exploring Computer Vision and Machine Learning | https://thatbrguy.github.io

BeyondMinds

BeyondMinds leads the way to hyper-customized AI products for enterprises. Based on BeyondMinds Modular Engine (‘BME’), we solve the inherent market challenge of working with unstructured data, facilitating self-supervised learning, maintaining and re-training of models

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store