Scaling up Anime with Machine Learning and Smart Real Time Algorithms

Authored By Chris Kennedy, Nick Fujita & Michael Dale

Crunchyroll
Crunchyroll
9 min readMay 28, 2020

--

The Crunchyroll catalog spans a massive amount of video content. Our published material dates back to when content was distributed on DVD or aired on standard definition TV at 480P all to way to current anime which is almost never produced at a higher resolution than 1080P. As displays are able to handle higher and higher resolutions and streaming platforms are able to handle variable internet speeds, what does this mean for modern-day anime fans? If you’re using Crunchyroll on 4K household televisions and 4K computer monitors, anime ends up being upscaled from its native resolution (again, a maximum of 1080P) up to your display’s resolution. This gets even more complicated if your internet bandwidth becomes constrained or if you want to download an episode for offline viewing at a smaller filesize to preserve space. In both cases, a lower resolution version of the original video file will be accessed and then will be upscaled to your device’s resolution through video decoding and rendering.

This leaves us with a pretty fascinating challenge–how can we explore thoughtful up-scaling to improve the anime viewing experience? The anime fan and content encoding communities have already been hard at work beginning and exploring this question, and we at Crunchyroll have been paying attention. We wanted to explore some of the proposed solutions, namely pre-processing and real-time up-scaling. What follows are our initial findings.

What your TV/Device/Browser does today.

As we look to improve the situation, it’s important to understand the current process. The majority of up-scaling uses efficient generic up-scaling algorithms like bilinear interpolation–essentially using mathematical equations to estimate and fill in visual data to account for the difference between the source video and a monitor’s resolution (for more information on this and other related algorithms, check out the Wikipedia page). In most cases your browser or your mobile app will use bilinear or bicubic interpolation to upscale videos and images. These algorithms work by creating weighted values for pixels by comparing adjacent pixels.

A geometric visualization of bilinear interpolation. The product of the value at the desired point (black) and the entire area is equal to the sum of the products of the value at each corner and the partial area diagonally opposite the corner (corresponding colors). Source Wikipedia

In 4k (and now 8K) TVs, there is a growing competitive landscape of “smart” scaling. Smart scaling techniques employ a process that occurs in real-time where the source video image is cropped into overlapping segments or patches, which are in tern mapped to corresponding higher resolution patches which are projected to the target output size. Since these algorithms are proprietary and often unique to particular manufacturers or televisions, a detailed comparison of them and their subjective impact on the quality of the anime viewing experience is well beyond the scope of our initial exploration, but it is an element that could be at play in your viewing experience, depending on your display and its configuration.

In any case, depending on the television capabilities does not scale well (excuse the pun), towards giving every user the best upscaled anime viewing experience possible on every device and platform.

We wanted to explore some practical approaches for Crunchyroll content. We explored two possible approaches from the anime up-scaling community; upscale as part of the encode and smart real-time upscaling.

ML Upscale Encoding

What you should know first is that we pursued two different types of upscaling: 1. upscaling that occurs on the video file level (which we will call the encode level) (before the file is even accessed for streaming) and 2. dynamic upscaling that occurs in real time as the file is being streamed. For upscaling on the encode level, we are using the software Waifu2x. Waifu2x is authored by GitHub user Nagadomi and inspired by SRCNN[1] (Super-Resolution Using Deep Convolutional Networks) techniques–applied machine learning used for upscaling images. It leverages a Convolutional Neural Network (CNN) model that compensates for kinds of details that are commonly lost as content is scaled up. It uses that model to predict and fill in the details as content is scaled to a resolution larger than its source.

Standard upscaled anime — image artifacting happens all the time as content is stretched to large format TVs.

Improved AI upscaled anime with Waifu2x, improves the look of anime on large high resolution displays.

One such example of catalog content with lower resolution source is the first episode of Naruto Shippuden, “Homecoming.” It was released in February of 2007, and Crunchyroll’s source video for this content is at 480P resolution.

A closer look at default scaling approaches (notice the tearing and loss of fidelity) when up-scaling the source video.

Waifu2x is able to retain quality and reduce encode / compression noise as the content is scaled.

Evaluation of upscale performance

For us to rollout these encodes to our fans (even in a limited experiment); it was important to obtain some objective measures of the quality gains by leveraging this technique. To test the upscale performance we had to establish benchmarks with assets that already included high resolution source videos. We downsampled (we took a source video at 720P resolution and made its resolution smaller) to 360P to represent the available quality of some of our early catalog titles; and then scaled up to 720P. We then compared the original 720P video to the videos that had been enlarged from 360P. We compared bilinear interpolation, bicubic interpolation, and Waifu2x. From these measurements, Waifu2x had significant gains towards accurate representation of the target material over traditional scaling techniques. An overview of what the scripting looks like for generation this data can be viewed here.

Comparing commonly used up-scaling algorithms to Waifu 2x. Peak signal-to-noise ratio (VMAF) measures the ratio between the maximum possible resolution of an image and the power of corrupting noise that would affect the image’s fidelity. Video multimethod Assessment Fusion is a newer quality metric that may more accurately reflect subjective viewer quality experiences.

What about upscaling to 4k?

The changes in image resolution from 1080P to 2160P (4K) are much more difficult to notice unless using very large displays. When you already have a very detailed 1080P source material, there is less opportunity to noticeably clean things up. The Waifu2x and traditional source scaling are often not noticeably different at 4k display per our comparison. With extreme zoom some deltas of bicubic interpolation vs Waifu2x start to be noticeable.

Traditional upscale of 1080P to 2140P

Waifu2x 1080P -> 2140P

Notice smoother scaling on lines; but there’s much less dramatic of a shift than the 480P upscale, and only noticeable on a very large display. We may explore this in more depth in future efforts, but for current experiments on encode level upscale we focused on content with lower resolution source material.

Where Can I see these upscale encodes?

We are experimenting with this up scaled technique on a few pieces of our historically low resolution content. Accessing this content on any Crunchyroll supported platform will consume the higher quality versions of these encodes.

Naruto Shippuden — Episode 1

Naruto Shippuden — Episode 33

Blue Submarine №6 — Episode 1

RealTime up-scaling

Waifu2x offers a solution at the video file level, but you will remember above that we were also exploring real time upscaling that happens as the file is being streamed (client level). For client level upscaling, GPU Anime4k is a leading solution. Anime4k was built by GitHub user bloc97 and is a “state of the art real-time anime upscale algorithm that can be implemented in arbitrary programming languages”. In the case of web experience, a WebGL shader (shaders are computer programs traditionally used to produce shades on 3D objects, but are now used widely in video post-processing) is used to support real-time up scaling within a web browser at resolutions up to 4k / 2160p. Interestingly, because Crunchyroll leverages soft subtitles (subtitles that are overlaid onto the video instead of existing as part of the original video file), we leverage browser compositing to scale the video and subtitle displays separately where possible.

Our video player update for the web experience allows for users to select the option to enable the anime4k filter. The experience even includes some basic controls on the filter. Unlike the small sample for Waifu2x which was restricted to the number of source video files we encoded with the program, this real time method can be applied to all content played with the Crunchyroll Velocity player when you have a GPU and modern browser. Here are some screenshots of the experience:

Traditional upscale can be seen on the left, with the Anime4k upscale on the right. Notice some sharpening of edges which Anime4k is optimized for.

Default Scaling (magnified 2x.)

Anime4k Scaling (magnified 2x). Notice less blurring of the shadows and lines as the content is scaled up for large format presentation.

How to play with real-time upscaling ?

The real-time upscale tools are available only on the web on select browsers where WebGL is available. The feature is very much still in the alpha phase, so please keep that in mind. You can see the selection under advanced quality controls for Premium Crunchyroll users of Chrome & Edge browsers. Here is the process:

Step 1: Load a video on Crunchyroll.com. Right click on the player and enable “Advanced Quality Controls”.

Step 2: Click on the gear icon in the lower right corner, select “Advanced Controls” then toggle Anime4k.

Step 3: You may now enjoy anime with the fine work of the Anime4k project. Some basic controls are provided if you want to adjust the default values.

Which approach is better?

The approach used in both encode and client upscaling is very different and they hit very different use cases. The ideal setup may be leveraging both as we look towards improving content scaling across the diversity of source files, contexts for streaming, and bandwidth constraints.

Future work

We aim to elicit feedback from the community around scaling filters or encodes. Another important aspect of this work would be, of course, to bring these upscale capabilities (and 4k versions of our applications) to game console and UHD streaming devices that are connected to 4k televisions where these techniques’ impact would be most apparent. Likewise, on mobile where the in-network constrained environment would allow for clear benefits towards anime specific upscaling on client vs the lossy traditional scaling techniques used today.

Also on our radar is how these upscaling strategies are used in the encode process itself. We would like to explore how knowledge of post-process filters could affect encoding trade-offs around where we spend our bytes towards maximizing visual quality. Developments within the AV1 codec (a video encoding format) ecosystem around how film grain synthesis (a concept where compression is better able to preserve film grain without loss of quality) has been tackled is inspirational towards this future exploration. See Film Grain Synthesis for AV1 Video Codec by Andrey Norkin and Neil Birkbeck for more.

We would like to, of course, send big thanks to the anime upscale community who have built these amazing tools. We look forward to investing more in this area as this exciting field continues to evolve.

[1] Chao Dong, Chen Change Loy, Kaiming He, Xiaoou Tang, “Image Super-Resolution Using Deep Convolutional Networks”, http://arxiv.org/abs/1501.00092

--

--