Frame Analysis — Minecraft RTX Beta
A review of the rendering architecture of the new Minecraft RTX renderer, reviewing their implementation of A-SVGF, their path tracer, and their Deep Learning Super Sampling 2.0 implementation.
Minecraft is a creative survival game where you must mine for resources, build tools, farms, explore the world in search for valuable treasures, and build portals to neighboring worlds — the Nether and End.
It’s a social game as well, where friends from different platforms can join in to work together to build interesting structures, and an intuitive interface to program your own servers and mods.
Recently, Minecraft for Windows 10 had an update to its renderer that introduced dramatic improvements to its design, adding real time ray tracing, ray tracing denoising, and deep learning super sampling.
So let’s review how Minecraft RTX renders a single frame, referencing research papers where appropriate.
Note: This is not an official analysis of the Minecraft RTX renderer, this is a beta version of the renderer so the design presented here may change in the final release. Please support Minecraft RTX by grabbing a copy of it on the Windows Store, thanks!
Ray Tracer
Prior to rendering some preprocessing is done to cache Physically Based Rendering (PBR) material textures to a 2048x2048
lookup texture (LUT).
Each frame of Minecraft RTX begins by building bottom level acceleration data structures BuildRayTracingAccelerationStructure(...)
for any animated objects in the scene such as animals or players.
Our top level acceleration structure is built, and we proceed to dispatchRays
for our primary rays and light shadows, writing to a variety of different outputs such as:
- Normals
- Albedo/Metalness
- Emissive/Roughness
- Opacity/Material ID
- Velocity
- Cached Irradiance
- Reprojected Primary Path Length
- Low Precision Position
- Geometric Normals
- Volumetric light Ray Shadows
- View Position
- View Direction
- Throughput
All passes are rendered twice next to one another on the x axis at 742x835
for a total of 1484x835
instead of 2560x1440
. This odd number appears to just be a quarter on X and half on Y of the output resolution plus some padding. On the left is primary ray hit data, on the right is primary ray hit data through transmissive surfaces such as water.
For sampling, Minecraft uses a blue noise array of 128 256x256
RGBA8
images.
More Rays are cast for non-primary rays, writing to:
- Indirect Diffuse
- indirect Diffuse Chroma
Then another ray dispatch is done, writing to:
- Indirect Specular
- Reflection Distance
Volumetric fog is computed last.
A-SVGF Denoising
Minecraft uses Spatio-Temporal Filtering [Schied et al. 2019], which consists of a variance driven spatio-temporal reprojection step which takes our moment buffers, history length, and previous frames and tries putting that data where it would be in the current frame.
For more information on real time tracing denoising with A-SVGF, check out my blog post on the subject.
This is followed by a bilateral filtering step, where over the course of serveral passes, they adaptively blur these frames. They also use a form of irradiance caching to help with resolving reprojected regions with little history information quicker.
- Diffuse Global Illumination w/ a 7x bilateral filter
- Reflections w/ a 3x bilateral filter
- Crepuscular Volumetric Rays with a 5x Guided bilateral filter
Finally, interleaved buffers are combined to produce the final output that will be fed to the DLSS 2.0 kernel.
Deep Learning Super Sampling
Deep Learning Super Sampling 2.0 (DLSS 2.0) uses an autoencoder that takes as input a jittered render target at a fraction of our output resolution, a jittered velocity buffer similar to Temporal Anti-Aliasing, and outputs a upscaled version of the final output.
UI Pass
UIs are rendered out at full resolution to a separate render target and composited at the end of the frame.
Postprocessing
Minecraft finishes the frame with tone mapping and any enabled postprocessing effects such as vignette.
Conclusion
Minecraft is a game that really benefits from a ray tracing system, with its dynamic environments and extreme variance in shadows and lighting, traversing cave systems and exploring structures feels much more immersive and interesting. If this was insightful or if you have any suggestions for another game to analyse let me know in the comments, and I’ll see you next time.
More Resources
- Digital Foundry’s Minecraft RTX developer interview provides a high level overview of the RenderDragon DX12 RTX renderer as well as the challenges the developer team had to face.
- Peter Kristof of Microsoft made a really robust RTX Ambient Occlusion example an implementation of SVGF here.
- Microsoft’s DirectML Super Resolution Example, while not NVIDIA Deep Learning Super Sampling 2.0 (DLSS 2.0), is similar in that both perform upscaling.
- GTC 2020 — Creating Physically Based Materials for Minecraft with RTX