Super Resolution explained — Ep1

Manas Satish Bedmutha
Jun 12 · 6 min read

This article is a part of a series. Please check the next article directly to get started with the algorithms, provided you are familiar with the problem and the metrics.

Today we are surrounded by cameras and surveillance devices almost everywhere. From sparkling clear DSLR images to blurry microscope images, we have varieties in sizes and qualities of image. The hardware technology is reinventing itself very fast to enable capture of high resolution images in all cameras. But in my opinion, it is yet to catch up with software, which can intelligently scale up the resolution of an image artificially.

That brings us the need for image super resolution. We cannot employ high resolution cameras everywhere. Objects like robots, remotely operated systems or even security cameras may not produce sharp images. Whenever we need to process poor quality images from such cameras, we might need to do a lot of things like magnifying, de-noising, or de-blurring among others. However, getting an enlarged image that matches exactly to what it should look like.

Thus we are always in the search of methods to provide a enlarged image as realistically as possible. While there exist a lot of classical methods of interpolation, with the advancements in the field of Computer Vision and Deep Learning, neural networks have almost completely taken over them. In a series of articles we will discuss some of the famous methods of image super resolution (henceforth, referred as SR).

Before we start discussing the algorithms, let us get acquainted with the exact problem and the metrics.

A representation of the Super Resolved images from LapSRN models (Source: http://vllab.ucmerced.edu/wlai24/LapSRN/)

The Problem

Suppose you have an image X, (m ×n ×c) pixels in size. We want to see how it will look in reality after scaling k times. This means we want the output image, Y to be (km × kn × c) pixels in dimension. The number of channels, c remains the same. The task in SR is specifically to develop such a mapping, F such that F(X) = Y.

However, the catch lies in the fact that we need to create more data from less. And that too times each pixel. Another problem is that lower resolution images have higher perception to noise. Thus always generating a perfect generalized mapping is difficult.

Consider a low resolution image X. Its corresponding desired output image is called as the ground truth image. Let us denote it as Y_true.

Metrics

The quality of our mapping is evaluated by the quality of the output. The closer the output of a mapping to the desired output, better the mapping.

For a given mapping F, let F(X) = Y_pred. Using the two images Y_true and Y_pred we define the metrics of evaluation.

Some metrics are used to measure the quality of an algorithm while others can also be used to guide the training such that maximizing or minimizing a metric is the final goal. Such a metric is referred to as the loss function. Each algorithm might use a different training setup with a variety of loss functions.

  1. Mean Squared Error (MSE):

Mean Squared Error, also called as MSE, is the average of the square of error. For our images of size km x kn, for each pixel (i,j) we will find the norm of the difference across all channels. This will be squared and averaged over the total number of pixels. That is, the MSE is found by the equation below.

It is usually used as a loss function in training the networks. The mean squared error is the simplest metric of correspondence that takes into account size of the image/data.

2. Peak Signal to Noise Ratio (PSNR):

The peak signal to noise ratio, also referred to as PSNR is a measure of the peak (maximum) error in the image. It is directly related to MSE as —

where, R represents the maximum range of the data. For an image, which is usually 8 bit, the maximum value is 255 while the minimum is 0. Then the value of R is their difference, that is 255.

PSNR is always defined in decibels (dB) and hence we take the logarithm of the ratio.

3. Mean Opinion Score (MOS):

Mean Opinion Scores are basically opinions/user ratings on the same image being processed by different algorithms. Multiple users rate these images and the average is given as MOS. It is usually done as a rating from 1 to 5, but other ranges can also be used.

Some papers in the recent years, like EnhanceNet explained that the PSNR was not a sufficient metric. Every pixel will have similar importance irrespective of its position and the numerical closeness between two pixels may not ensure closeness in the photo realistic sense. Hence the MOS was proposed as an additional metric.

4. Structural Similarity Index (SSIM):

Structural Similarity Index is a measure of perceptual closeness rather than the numerical closeness. It is complicated to understand but the mathematical formulation is straightforward.

It takes into account, the image degradation (change) across luminance (l), contrast (c ) as well as structure (s).

For two patches x and y, we define the SSIM index as —

Source: Wikipedia (Structural Similarity)

Here, μ and sigma are the average and standard deviation of the patches respectively. Constant c1 = (k1 x L)² and c2 = (k2 x L)² such that L is the total range of pixel values (same as R in the PSNR definition). We define c3 = c2 / 2 and set α , β , γ to 1.

Datasets:

For any model, be it adaptive or a neural network, the mapping needs to be learnt from some data. Some of the commonly used datasets are:

  1. BSDS — Berkeley Segmentation DataSet is a rich natural image collection of data
  2. Set5 — One of the most popular datasets. It contains five images that contain many challenging regions for a variety of Computer Vision problems.
Set5 Dataset (Source: http://vllab.ucmerced.edu/wlai24/LapSRN/)

3. Set14 — Another benchmark dataset that contains photographs as well as graphics. Used for testing algorithms.

Others include T91, NTIRE, etc.

Note: We need to first downscale images and add various types of noises to simulate a low resolution image. The original image can be considered as a high resolution image. See more here.


Keeping up with SR —

Conferences:

Solutions to SR have been featured across multiple A starred conferences. CVPR usually features many algorithms in deep learning. ICCV and ECCV are other popular ones.

Demos:

These demos will give you a feel of the Super Resolution task and the immense opportunities in the field.

  1. https://bigjpg.com/
  2. http://waifu2x.udp.jp/ for anime lovers!
  3. https://letsenhance.io/

Open Source:

Many repositories can help out in getting started with the code. A few repositories to start with will be —

  1. YapengTian/Single-Image-Super-Resolution — provides a collection of all almost algorithms for SR based on the approach used.
  2. titu1994/Image-Super-Resolution — has a collection of implementation of a lot of SR algorithms in Keras

3. tensorlayer/srgan — one of my favourite algorithms!

Let me know your thoughts on the article in the comments or here. Coming up next: SRCNN; stay tuned!