Real ESRGAN: Super-Resolution Model Enhanced for Denoising

David Cochard
axinc-ai
Published in
4 min readJan 10, 2024

This is an introduction to「Real ESRGAN」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using ailia SDK as well as many other ready-to-use ailia MODELS.

Overview

Real ESRGAN is a super-resolution model that enlarges images. Based on the architecture of ESRGAN (Enhanced Super-Resolution Generative Adversarial Networks), Real ESRGAN has improved its training dataset and model architecture to enhance its denoising capabilities. When enlarging images to high resolution, it effectively removes noise from the original image, enabling the acquisition of clearer and more distinct images than traditional methods.

Real ESRGAN results (Source: https://arxiv.org/pdf/2107.10833.pdf)

Architecture

When applying super-resolution to images, it’s essential for the model to not only account for downsampling due to resizing but also for different noise types present in the original image.

These noises include blur, general noise (such as Gaussian, Poisson, Color, Gray), and artifacts like blocking and ringing resulting from JPEG compression.

The original ESRGAN assumed that these noises were applied only once. However, images on the internet often undergo multiple compressions, leading to stronger noises that were not adequately addressed.

Real ESRGAN addresses this by applying noise twice when creating the dataset for training, attempting to restore the original image from those with more noise influence. Additionally, to simulate ringing artifacts that frequently occur near the outlines in hand-drawn images, a 2D sinc filter is applied. This filter cuts high-frequency components at random frequencies, thereby generating ringing artifacts.

Procedure for training dataset creation (Source: https://arxiv.org/pdf/2107.10833.pdf)

The model architecture of Real ESRGAN is an improved version of ESRGAN. Specifically, it modifies the UNet’s VGG-based backbone to include Skip Connections similar to ResNet. Additionally, Batch Normalization is replaced with Spectral Normalization. These changes enhance the model’s ability to handle more complex and varied types of noise and artifacts in images, contributing to higher quality super-resolution results.

Real ESRGAN architecture (Source: https://arxiv.org/pdf/2107.10833.pdf)

For the original, high-resolution dataset, Real ESRGAN uses DIV2K, Flickr2K, and the OutdoorSceneTraining datasets. The resolution of the training patches is 256x256.

During training, a combination of L1 loss, Perceptual loss, and GAN loss functions are used. This combination of loss functions helps in achieving a balance between maintaining image fidelity (through L1 loss), ensuring perceptually convincing results (through Perceptual loss), and creating realistic textures and details in the upscaled images (through GAN loss). This approach is key to the effectiveness of Real ESRGAN in producing high-quality super-resolution images.

Output Image Quality

This is a comparison between the traditional ESRGAN and Real-ESRGAN. While ESRGAN tends to retain the ringing artifacts around the contours present in the original image, Real ESRGAN mostly removes these to generate images with a sharper and clearer appearance which significantly enhances the overall quality of the output images.

Benchmark (Source: https://arxiv.org/pdf/2107.10833.pdf)

Usage with ailia SDK

You can use Real-ESRGAN with ailia SDK using the following command.

$ python3 real_esrgan.py -i input_anime.jpg -s output.jpg

By default, Real ESRGAN is optimized for real-world images, but it also supports an anime model which can be activated by using the -m option.

$ python3 real_esrgan.py -m RealESRGAN_anime -i input_anime.jpg -s output_anime.jpg

Usage with Unity or C#

To run Real ESRGAN in Unity using the following sample.

You can also run it in vanilla C# in Visual Studio.

ax Inc. has developed ailia SDK, which enables cross-platform, GPU-based rapid inference.

ax Inc. provides a wide range of services from consulting and model creation, to the development of AI-based applications and SDKs. Feel free to contact us for any inquiry.

--

--

David Cochard
axinc-ai

Engineer with 10+ years in game engines & multiplayer backend development. Now focused on machine learning, computer vision, graphics and AR