Comparison of Stable Diffusion’s “HiRes fix” upscalers

In my search for the perfect image, I am investigating the effect of denoising strength on some HiRes fix upscalers

Fabian W.
4 min readApr 27, 2023

This is the 2nd article on this topic. In my first article I examined the effects of denoising strength and hires steps parameters for the “latent” upscaler.

Before we start it should be clarified what “HiRes fix” actually does:

  1. Generate an image based on txt2img settings.
  2. Scale the image up to the desired size using the selected upscaler
  3. Add noise to the image
  4. Create a new upscaled image based on the image from step 3

Upscalers can be basically divided into 2 groups, namely latent and non-latent. Latent stands for “low resolution native image”. Non-latent upscalers work on the basis of the image generated in step 3. Latent upscalers work on the basis of the low resolution native image.

For this article, the following image was used as a starting point:

Prompts: high quality, masterpiece, extremely detailed, intricate, landscape of a futuristic sci-fi city, neon lights, winter, snow, blizzard

Negative prompt: worst quality, low quality, comic, blurry, text, watermark

Steps: 20, sampler: DPM++ 2M Karras, CFG scale: 7, seed: 631447457, size: 512x512, model hash: 9aba26abdf, model: deliberate_v2

In total 176 images were created.

  • On the X-axis, the “Denoising Strength” changes in a value range of [0, 1] with a step size of 0.1 (11 steps in total).
  • On the Y-axis, the total of 16 upscalers are mapped. All images were created with 25 hires steps and upscale by 2.

The following upscalers were tested (in this order):

  • Latent (antialiased)
  • Latent (bicubic)
  • Latent (bicubic antialiased)
  • Latent (nearest)
  • Latent (nearest-exact)
  • None
  • Lanczos
  • Nearest
  • 4x-UltraSharp
  • LDSR
  • R-ESRGAN 4x+
  • R-ESRGAN 4x+ Anime6B
  • ScuNET GAN
  • ScuNET PSNR
  • SwinIR 4x’

The 4x-UltraSharp upscaler is not automatically installed at the time of writing, but can be downloaded here.

A closer look leads to the following findings:

  • All upscalers tend to change the content of the image at a high denoising strength. The changes are not random. For example, with a denoising strength of 1, each image looks very similar regardless of the upscaler.
  • All latent upscalers except LDSR produce blurred images with a denoising strength of [0, 0.5]. LDSR does produce sharp images, but the images from the non-latent upscalers look better.
  • All non-latent upscalers produce upscaled images at low denoising strength that are very close in content to the original image. However, the quality of the image varies, so at a denoising strength of 0.4, the Lanczos, Nearest and both ScuNET upscalers produce comparatively blurry images. At a denoising strength of 0.0, 4x-UltraSharp, both R-ESRGAN 4x+ and SwinIR 4x’ continue to produce good images.

In addition to image quality, generation time is another important factor. When generating the images, I saved the elapsed time together with the parameters. Let’s take a closer look at this:

The boxplot shows that image generation with almost all upscalers needed between 20 and 25 seconds per image. Exceptions are LDSR with over 60 seconds and SwinIR 4x with just under 30 seconds.

Conclusion

Okay, let’s derive recommendations from the previous findings:

  • All upscalers tend to change the content of the image at a high denoising strength. However, the point at which the content is drastically changed seems to differ slightly for each upscaler. At a denoising strength of 0.7, it is clear that the latent upscalers change the course of the road drastically, while the non-latent upscalers roughly maintain the course of the road.
  • Non-latent upscalers can be used to upscale an image faithfully using a low denoising strength.

Image quality is of course subjective, but I personally think that the images look best with the 4x-UltraSharp upscaler. In addition, this upscaler produces good results with any denoising strength and the image generation time is not extremely high. At this point i can’t see any reason to use any of the latent upscalers.

--

--

Fabian W.

Software developer with special interest in IoT, ML, data science and emerging technologies.