Image Super Resolution: A Comparison between Interpolation & Deep Learning-based Techniques to Improve Clarity of Low-Resolution Images

Published in

HTX S&S COE

13 min readApr 13, 2023

Do you have a digital photo that is important to you, but is in low-resolution because the image was taken when camera technologies were not as advanced as they are today? Are there ways to digitally enhance such an image to improve its image quality? The answer is yes. The process of enhancing a low-resolution to high-resolution image is known as Image Super Resolution.

Image Super Resolution techniques have evolved over the years. With recent advancements in AI techniques and their applications in image enhancement, the quality of generated high-resolution images has improved significantly. This article will discuss 2 Super Resolution techniques — interpolation and deep learning-based — compare the image qualities generated, and discuss the resources required for these techniques.

When do we use Super Resolution and why do we need it

Super Resolution has a wide range of applications and is most commonly used in medical imaging, surveillance, and the media industry.

In medical imaging, Super Resolution is beneficial in obtaining high-resolution medical images which contain details that allow doctors to make a more accurate diagnosis.

In surveillance, high-resolution images are desired to identify individuals or vehicles of interest. A low-resolution image of a suspect would not be very useful for investigation if his discriminating features could not be seen clearly.

In the media industry, Super Resolution is often used to enhance image and video contents to give the audience a better viewing experience.

What are some of the causes of poor image quality that require Super Resolution?

External Conditions: External conditions such as poor illumination, heavy rain or relative motion between camera and object can result in poor image quality.
Image Compression: Image compression is often used to reduce the file size of an image by removing some of its data for storage, handling and transmission purposes. For example, videos and images are often compressed before they are sent over communication platforms such as WhatsApp, or when they are saved in lossy compression formats such as JPEG.
Hardware Limitations: The resolution of images/videos captured is dependent on the camera used. The first digital camera invented by Kodak in 1975 was only capable of capturing images with a resolution of 100x100 pixels (0.01 megapixels). Contrast that with the iPhone 14 Pro Max that can capture images at 8000 x 6000 pixels (48 Megapixels). Other types of hardware limitations include lens distortion and lens flare.
Distance of object from sensor source: Objects that are further away from the camera will appear smaller and therefore their images will have fewer pixels.

Poor image quality often arises from resource constraints. Having a camera with the best of breed specifications will obviously capture the best quality images. When applied to a network of cameras, having each camera capture and transmit high-resolution images or videos would result in prohibitively high equipment and data storage costs. As such, images/videos captured for storage are often scaled down from their original high-resolution versions. Super Resolution is therefore useful to overcome such limitations.

Super Resolution can be applied using either Single Image Super Resolution (SISR) or Multi Image Super Resolution (MISR). In SISR, a high-resolution image can be generated from just one of its low-resolution image versions. In MISR, more than one low-resolution image of the same scene or object is used to generate a single high-resolution image. The SISR method is more popular because of its 1:1 source-to-output ratio and is used more frequently in image processing applications.

Super Resolution algorithms can be divided into 4 main categories. These include traditional techniques like interpolation, reconstruction and learning-based to current deep learning-based techniques. The interpolation-based technique is commonly used via the resize function from the OpenCV library whilst deep learning-based techniques are now possible with the prevalence of neural networks.

Below, we will compare the quality of Super Resolution images using the traditional interpolation-based versus deep learning-based techniques.

Using OpenCV’s Interpolation Upsampling Methods

The simplest way to super resolve an image is to perform a simple upsampling using interpolation. Interpolation is a direct way to increase the image’s dimension by simply adding new pixels or data points to the low-resolution image. Here, we discuss 3 popular interpolation techniques, namely (1) Nearest Neighbour interpolation, (2) Bilinear interpolation and (3) Bicubic interpolation.

Nearest Neighbour Interpolation

This is a simple and fast method as it involves little calculation. Pixels are added based on the intensity value of its nearest neighbours’ pixel. Although the result will be an image with higher resolution, the image can be blocky and looks unnatural. In OpenCV, this is achieved though the following code snippets.

import cv2
image = cv2.imread(image_path)
cv2.resize(image, (new_width, new_height), interpolation=cv2.INTER_NEAREST)

*Example of a nearest neighbour interpolation with an upsampling factor of 2, applied on a 2x2 array. Source credits:* https://theailearner.com/2018/12/29/image-processing-bilinear-interpolation/

2. Bilinear Interpolation

The bilinear interpolation method uses linear interpolation. The weighted average of the intensity values of 4 nearest neighbours are taken to obtain the intensity value of the new pixel. It achieves smoother results than using nearest neighbour interpolation, but the outcome for edges (sharp transitions) are not ideal. This can be achieved using the OpenCV resize function with cv2.INTER_LINEAR method. The output for the same 2x2 array example used in point 1 above is shown below.

*Example of a bilinear interpolation with an upsampling factor of 2, applied on a 2x2 array. Source credits:* *https://theailearner.com/2018/12/29/image-processing-bicubic-interpolation/*

3. Bicubic Interpolation

Bicubic interpolation also uses a weighted average of its neighbours to obtain the output. However, unlike bilinear interpolation, it uses 16 neighbours (4x4 neighbourhood) and has a different weight distribution calculation. It produces sharper images than the above 2 methods and is often used in image editing software. In OpenCV, this is achieved using the cv2.INTER_CUBIC method.

*Example of a bicubic interpolation with an upsampling factor of 2, applied on a 2x2 array. Source credits:* *https://theailearner.com/2018/12/29/image-processing-bicubic-interpolation/*

We demonstrate 2 examples of upsampling performed using interpolation methods below. The original image is obtained from CelebA-HQ dataset with 1024x1024 pixels. It was downsized to 64x64 pixels using OpenCV JPEG compression and resize methods. The downsized images were then upsampled to 1024x1024 pixels using OpenCV’s nearest neighbour, linear and bicubic interpolation methods with cv2.resize() function. It is visually evident that all the upsampled images from these 3 methods are pixelated, with the worst pixelation observed from the nearest neighbour method.

Original images (1024x1024 pixels) were downsized to 64x64 using OpenCV’s JPEG compression and resize methods. The 64x64 downsized image was then upsampled using OpenCV’s linear, bicubic and nearest neighbour interpolation methods.

Using Deep Learning-Based Techniques

With the recent advancements of AI, deep learning-based super resolution models are actively researched on. There are several open-source models that have achieved impressive performance on benchmark studies. Super Resolution Convolution Neural Networks (SRCNN) is one of the first deep learning methods that outperformed earlier traditional Super Resolution methods. The use of CNN architecture enabled the AI model to learn an end-to-end mapping between low-resolution and high-resolution images. With three convolutional layers in the network architecture, SRCNN takes in a single luminance channel of the YCbCr image as input, upsampled using a traditional bicubic interpolation and then refine the resulting image using the Mean Square Error (MSE) loss function.

Since the conception of SRCNN in 2014, several researchers have developed other deep learning based Super Resolution models, some of which addresses the pitfalls of SRCNN. These include FSRCNN, VDSR, PixelCNN and SRResNet. Yang et al. conducted a detailed review of various deep learning based Super Resolution methods which can be found here.

State-of-the-art deep learning-based methods: Generative Adversarial Networks (GANs) and Transformers

Over the past 10 years, researchers have been working on state-of-the-art Super Resolution algorithms using Generative Adversarial Networks (GANs) and Transformers. The aim was to recover finer texture details that deep learning CNN based networks were unable to achieve with large upsampling factors. In 2017, Super Resolution GAN (SRGAN) was proposed. During the training process, a high-resolution image is first downsampled to a low-resolution image. The low-resolution image is then upsampled using a generator. A discriminator is then used to distinguish high-resolution images generated by the generator from real high-resolution images. The GAN’s loss, which comprises both content loss (reconstruction loss) and adversarial loss, is backpropagated to train the generator and discriminator.

In 2018, Enhanced SRGAN (ESRGAN) was released with 3 main improvements over SRGAN: (1) enhancement of network architecture by introducing a Residual-in-Residual Dense Block without the batch normalization as the basic network unit; (2) use of relativistic discriminator to predict relative realness; and (3) modifying perceptual loss by using features before activation. Like SRGAN, low-resolution images were obtained from high-resolution images through a bicubic kernel with a down sampling factor of 4 for training. A comparison of the results from SRGAN and ESRGAN is shown below.

*Comparison of super resolution results from SRGAN and ESRGAN. Source credits:* *https://arxiv.org/pdf/1809.00219.pdf*

The author of ESRGAN released Real-ESRGAN in 2021 with the intent of performing blind super resolution on low-resolution images with more complex degradations. Besides modifying the network architecture, one of the major changes made was the introduction of higher-order degradation modelling process to produce low resolution images that better resemble real-world degraded images. The degradation modelling process involves repeated degradations to simulate real-world scenarios where images are passed through several parties, therefore introducing several layers of artifacts from image acquisition, digital enhancements, digital transmission, compression, etc. An overview of the degradation process is shown below. Sinc filters are used to generate ringing and overshoot artifacts, which are commonly produced by sharpening algorithms or JPEG compression.

*Overview of degradation process for Real-ESRGAN. Source credits:* https://arxiv.org/pdf/2107.10833.pdf

While Real-ESRGAN is targeted at general image Super Resolution, GFPGAN was published in the same year and is aimed at blind face restoration. Training data for GFPGAN was degraded by introducing Gaussian blur, downsampling, Gaussian noise and JPEG compression. A generative facial prior from pretrained face GAN, which contains information on facial textures and colours, was used. Unlike Real-ESRGAN which performs general image Super Resolution, GFPGAN specializes in Super Resolution of faces.

Besides GANs, transformers have also been increasingly used for Super Resolution. One of the most recent publications by Zhou et al., CodeFormer, has produced high fidelity results, based on experiments that we’ve done to compare super resolved images from GFPGAN and CodeFormer. CodeFormer was able to generate better facial textures that look more realistic and less digital than those from GFPGAN. It was also able to super resolve faces with a slight side profile realistically, which GFPGAN was unable to achieve.

To illustrate this, we used the low resolution (64x64 pixels) images from the previous CelebA-HQ example, and compared the results obtained by super resolving the low-resolution images using GFPGAN and CodeFormer as shown below.

*Using the same 64x64 pixels downsized images from CelebA-HQ dataset, the low resolution images were super resolved to 1024x1024 pixels using GFPGAN and CodeFormer.*

How to assess the quality of a super resolved image

Full Reference Image Quality Assessment (FR-IQA)

The most common method is Full Reference Image Quality Assessment (FR-IQA) using peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM). However, there are 2 limitations with the use of PSNR and SSIM. First, PSNR and SSIM require the use of a reference image of high-resolution and the same scene for comparison. In most deployment scenarios, such reference images are not available.

Second, we observed that PSNR and SSIM scores do not correlate with visual image fidelity. In the examples above, we observed visually that the results of images that were super resolved using interpolation-based methods paled in comparison to GFPGAN and CodeFormer. However, PSNR and SSIM scores paint a different picture as shown in the image below. For both methods, bilinear interpolation produced the highest PSNR and SSIM scores, while visually, we observed that CodeFormer produced the best results.

*PSNR and SSIM scores for super resolved images from CelebA-HQ dataset. For both methods, a higher score indicates a better image quality. The best PSNR and SSIM scores are highlighted in green.*

No Reference Image Quality Assessment (NR-IQA)

Another approach to assess the quality of super resolved images is the No Reference Image Quality Assessment (NR-IQA) method. Here, we use MUSIQ-PAQ2PIQ and ILNIQE and the results are shown in the table below. For both methods, the NR-IQA scores better reflect the visual quality of the super resolved images.

NR-IQA scores using MUSIQ-PAQ2PIQ and ILNIQE methods for super resolved images from CelebA-HQ dataset. For MUSIQ-PAQ2PIQ, a higher score entails a better image quality. For ILNIQE, a lower score entails a better image quality. Scores indicating the best image quality is highlighted in green.

Deepface’s VGG-Face

One concern of super resolving images is whether the features that define the individual are altered. In other words, can you still identify the identity of the person in the super resolved image?

We took 2 images of Gabriel Jesus from various sources and performed face verification using Deepface’s VGG face model. The threshold for cosine similarity distance used was 0.4. In Example 1 of the table below, both images were verified as the same individual with a cosine similarity score of 0.26 (smaller values indicate better similarity).

Next, we super resolved one of the two images using CodeFormer with an upsampling ratio of 4. This is illustrated in Examples 2 and 3 below. The images were still verified as the same individual, albeit with a higher cosine similarity score of 0.277 and 0.364 respectively.

Last, we compare the scenarios where both images are super resolved using CodeFormer with an upsampling ratio of 4. The features of Gabriel Jesus look more distinct but the faces are no longer verified to be the same individual as the cosine similarity distance exceeds the threshold value of 0.4, as illustrated in example 4 below. This could be because super resolution introduces slight changes in facial features used for face verification and matching as can be seen by the increase in cosine similarity scores. When both images are super resolved, facial features in both images are altered such that there is a larger difference between the two, and hence further increasing the cosine similarity distance.

Comparison of original and super resolved images of Gabriel Jesus from various sources. Successful face verification where images were identified as the same individual is shown in green highlights, while unsuccessful face verification where images were not identified as the same individual is shown red highlight. Source credits: https://www.coachesvoice.com/cv/gabriel-jesus-arsenal-scout-report/ and https://www.skysports.com/football/news/11095/12633241/gabriel-jesus-to-arsenal-will-the-manchester-city-striker-thrive-with-more-regular-playing-time

Is Super Resolution Generalisable?

Based on the experiments conducted, we observe that Super Resolution models like GFPGAN and CodeFormer which are trained specifically on facial images (FFHQ dataset) perform better in super resolving facial images as compared to models that are trained on other types of data such as objects or general scenes. This is because the model learns different details from facial features than objects or general scenes. This is also likely why in the Real-ESRGAN inference code (inference_realesrgan.py), there is a specific argument to call on GFPGAN to enhance faces.

We attempted to perform Super Resolution using CodeFormer’s background upsampling and GFPGAN on a low-resolution car plate as shown in the figure below. In the low-resolution image, you can vaguely read the first 3 characters of the car plate as “SKF”. After super resolving the image, the first 3 characters of the car plate become unintelligible. We hypothesize that the models are not familiar with shapes and textures of alphanumeric characters as they were trained on facial images.

Original low resolution carplate image was super resolved using CodeFormer’s background upsampling and GFPGAN. First 3 characters of the carplate becomes intelligible after being super resolved. Part of the carplate number has been redacted.

Resource Considerations for each Super Resolution technique

We compared the inference time and CPU memory utilization required to super resolve one 64x64 image from CelebA-HQ dataset to 1024x1024 pixels for interpolation and deep learning-based techniques as shown below. The experiments were performed using Intel(R) Xeon(R) E-2176M CPU @ 2.70GHz with 6 CPU cores and 32GB RAM. The average inference time and CPU memory utilization were calculated over 3 iterations of super resolving the same image. Evidently, a longer average inference time and higher CPU memory utilization were required for deep learning based-techniques as much more processing is required.

*Average inference time to super resolve one 64x64 image from CelebA-HQ dataset with an upsampling factor of 16, to 1024x1024 pixels for interpolation and deep learning-based techniques using CPU.*

GPUs can be used to improve the inference time for deep learning-based techniques. We demonstrate the average inference time and maximum GPU memory utilization for GFPGAN and CodeFormer using 1x NVIDIA Quadro P1000 GPU below. Similar to the experiments conducted on CPU, the average inference time was calculated over 3 iterations of super resolving the same image. The authors of CodeFormer have also provisioned a HuggingFace implementation that allows users to tap on NVIDIA Tesla T4 GPU for inference.

Average inference time and maximum GPU memory utilization required to super resolve one 64x64 image from CelebA-HQ dataset with an upsampling factor of 16, to 1024x1024 pixels for interpolation and deep learning-based techniques using 1x NVIDIA Quadro P1000 GPU.

In addition to inference time and computational power requirements, interpolation techniques are also easier to execute as they have fewer library dependencies. The interpolation techniques demonstrated above only required the OpenCV library, whereas GFPGAN requires PyTorch>=1.7 and CodeFormer requires PyTorch>=1.7.1 on top of OpenCV.

The inverse speed-vs-quality relationship presented by interpolation methods versus deep-learning methods are typical of any AI solution. Before we jump to conclusions on one being better than the other, both methods offer compelling merits in practical applications. Interpolation methods can be used when a quick assessment is needed to bring up the resolution of a large number of low-resolution images. On the other hand, deep-learning techniques can be used when a definite low-resolution instance is known and a high fidelity super resolved image is required.

Ethical Considerations

Super Resolution can be a powerful tool in enhancing image quality and bringing clarity to image details. It is however important to be aware of biases that can arise from the use of such methods. Illustrated below using the example of Obama, a pixelated image of Barack Obama can turn into someone very different after going through Super Resolution.

*Example of biasness in super resolution models where a pixelated image of Barack Obama was super resolved to output a white man. Source credits:* *https://www.theverge.com/21298762/face-depixelizer-ai-machine-learning-tool-pulse-stylegan-obama-bias*

In general, model biasness can result from biases in the data used to train the model. Hence it is important to not just understand how the model works, but also what data it is trained on.

Who we are

Sense-making and Surveillance Centre of Expertise (S&S COE) is part of Singapore’s HTX (Home Team Science and Technology Agency). We focus on in-house product development, by applying some of the techniques discussed above, using Agile methodology to address unique use cases from our Home Team agencies. We modify and fine-tune models using our own curated datasets. To ensure that we are offering relevant products or solutions to our users, we keep abreast of developments in Super Resolution techniques from the research and open-source community. Internal benchmarking of various state-of-the-art techniques such as Real-ESRGAN, GFPGAN and CodeFormer as discussed above are conducted for fine-tuning and deployment. We develop relevant software modules with front-end UI to serve the AI models as a fully deployable software product. Our default deployment option is on Government Commercial Cloud (GCC), either on Amazon Web Services or Microsoft Azure. We do have options that run on-premise to cater for situations where data cannot leave the users’ environment.

If you are interested in this topic and want to find out more, feel free to reach out to me at ONG_Si_Ci@htx.gov.sg. Of course, if you have any comments and suggestions on what we can do better, let us know too!

Meanwhile, do follow us on Medium to stay updated on the projects that we are working on.