Frame Analysis — Minecraft RTX Beta

Published in

The Startup

5 min readMay 22, 2020

A review of the rendering architecture of the new Minecraft RTX renderer, reviewing their implementation of A-SVGF, their path tracer, and their Deep Learning Super Sampling 2.0 implementation.

Minecraft is a creative survival game where you must mine for resources, build tools, farms, explore the world in search for valuable treasures, and build portals to neighboring worlds — the Nether and End.

It’s a social game as well, where friends from different platforms can join in to work together to build interesting structures, and an intuitive interface to program your own servers and mods.

Recently, Minecraft for Windows 10 had an update to its renderer that introduced dramatic improvements to its design, adding real time ray tracing, ray tracing denoising, and deep learning super sampling.

So let’s review how Minecraft RTX renders a single frame, referencing research papers where appropriate.

Note: This is not an official analysis of the Minecraft RTX renderer, this is a beta version of the renderer so the design presented here may change in the final release. Please support Minecraft RTX by grabbing a copy of it on the Windows Store, thanks!

Ray Tracer

Prior to rendering some preprocessing is done to cache Physically Based Rendering (PBR) material textures to a 2048x2048 lookup texture (LUT).

Each frame of Minecraft RTX begins by building bottom level acceleration data structures BuildRayTracingAccelerationStructure(...) for any animated objects in the scene such as animals or players.

Our top level acceleration structure is built, and we proceed to dispatchRays for our primary rays and light shadows, writing to a variety of different outputs such as:

Normals
Albedo/Metalness
Emissive/Roughness
Opacity/Material ID
Velocity
Cached Irradiance
Reprojected Primary Path Length
Low Precision Position
Geometric Normals
Volumetric light Ray Shadows
View Position
View Direction
Throughput

All passes are rendered twice next to one another on the x axis at 742x835‬ for a total of 1484x835 instead of 2560x1440. This odd number appears to just be a quarter on X and half on Y of the output resolution plus some padding. On the left is primary ray hit data, on the right is primary ray hit data through transmissive surfaces such as water.

For sampling, Minecraft uses a blue noise array of 128 256x256 RGBA8 images.

More Rays are cast for non-primary rays, writing to:

Indirect Diffuse
indirect Diffuse Chroma

Then another ray dispatch is done, writing to:

Indirect Specular
Reflection Distance

Volumetric fog is computed last.

A-SVGF Denoising

Minecraft uses Spatio-Temporal Filtering [Schied et al. 2019], which consists of a variance driven spatio-temporal reprojection step which takes our moment buffers, history length, and previous frames and tries putting that data where it would be in the current frame.

For more information on real time tracing denoising with A-SVGF, check out my blog post on the subject.

This is followed by a bilateral filtering step, where over the course of serveral passes, they adaptively blur these frames. They also use a form of irradiance caching to help with resolving reprojected regions with little history information quicker.

Diffuse Global Illumination w/ a 7x bilateral filter
Reflections w/ a 3x bilateral filter
Crepuscular Volumetric Rays with a 5x Guided bilateral filter

Finally, interleaved buffers are combined to produce the final output that will be fed to the DLSS 2.0 kernel.

Deep Learning Super Sampling

Deep Learning Super Sampling 2.0 (DLSS 2.0) uses an autoencoder that takes as input a jittered render target at a fraction of our output resolution, a jittered velocity buffer similar to Temporal Anti-Aliasing, and outputs a upscaled version of the final output.

UI Pass

UIs are rendered out at full resolution to a separate render target and composited at the end of the frame.

Postprocessing

Minecraft finishes the frame with tone mapping and any enabled postprocessing effects such as vignette.

Conclusion

Minecraft is a game that really benefits from a ray tracing system, with its dynamic environments and extreme variance in shadows and lighting, traversing cave systems and exploring structures feels much more immersive and interesting. If this was insightful or if you have any suggestions for another game to analyse let me know in the comments, and I’ll see you next time.

More Resources

Digital Foundry’s Minecraft RTX developer interview provides a high level overview of the RenderDragon DX12 RTX renderer as well as the challenges the developer team had to face.
Peter Kristof of Microsoft made a really robust RTX Ambient Occlusion example an implementation of SVGF here.
Microsoft’s DirectML Super Resolution Example, while not NVIDIA Deep Learning Super Sampling 2.0 (DLSS 2.0), is similar in that both perform upscaling.
GTC 2020 — Creating Physically Based Materials for Minecraft with RTX