Two-Pass Occlusion Culling

Milos Kruskonja
9 min readDec 24, 2022

--

Occlusion Culling is an optimization technique used to improve performance by skipping the rendering of objects which are hidden by some other objects in the scene. There are various occlusion culling techniques, each presenting different sets of problems like: visible popping, data authoring and performance issues. A popular approach in modern real-time rendering is a form of a GPU-Driven culling technique called Hierarchical Z-Buffer (HZB) Occlusion Culling. However, this method also has major flaws, which Two-Pass HZB Occlusion Culling is trying to improve.

HZB Occlusion Culling overview

An HZB is essentially a MIP chain generated by downsampling a depth buffer using the min or max depth value of sets of 4 texels to create each new texel for the next MIP level. Whether the min or max is used is dependent on whether the Reversed-Z is used. Then objects can be culled against the HZB using its bounding volumes. Beyond occlusion culling, an HZB can be utilized in rendering of Volumetric Fog, Screen Space Reflections and more.

Part of an HZB built from a depth buffer with reversed-Z
An example of a rendered frame that utilizes HZB occlusion culling

There are a few ways to optimize the process of building an HZB. Two such methods are Texture Gathering and Sampler Reduction Modes, both of which allow reducing the number of texture samples, resulting in improved performance.

Testing against an HZB is typically done in a compute shader using Axis Aligned Bounding Boxes (AABB). The result of this shader dispatch is usually a set of indirect draw call arguments representing only visible objects, which are then used to perform an Indirect Draw Call.

An HZB occlusion test is performed by first choosing the correct MIP level. The goal here is to find a right MIP where AABB covers 4 neighbouring texels. This is typically done by some logarithmic equation which factors in sides of an AABB. MIP level selection is slightly more complex if the lengths of HZB sides are not the same.

// MIP level selection example
int mipLevel = floor(log2(max(AABB.pixelWidth, AABB.pixelHeight)));

After the sufficient MIP level is chosen, the depth value of mentioned 4 texels is then compared to the depth of the object’s closest point to the camera. The result of this comparison tells whether the object is culled or not. After the occlusion test is finished for the entire scene, objects that passed the occlusion test can be rendered.

Problem

As already stated, a depth buffer is required to build an HZB. The question is how is this depth buffer obtained in the first place. One might presume that rendering all objects is necessary to fill the depth buffer, but this would defeat the purpose of culling in general.

Potential Solutions

One common approach involves rendering a small subset of objects, which would serve as occluders. These occluders are typically handpicked and authored by artists. They are usually large objects such as buildings, walls and terrain. The rendering of occluder objects is typically done only to the depth buffer, without fragment shader invocations, a process known as Depth Prepass. Using the resulting depth buffer enables the construction of an incomplete but conservative HZB. This HZB is then used in the occlusion culling process. Although effective, this technique demands a significant amount of manual effort to maintain its usefulness.

Depth prepass of foreground buildings from Assassin’s Creed Unity, captured with NSight

In many scenarios, it would be reasonable to assume that visible objects from the previous frame will mostly remain visible in the current frame as well. One way to make use of this insight is to recycle the previous frame’s depth buffer in order to generate an approximation of the new depth buffer for the current frame. This technique, known as Depth Buffer Reprojection, involves taking into account the new camera transform and Velocity Vector Buffer from the previous frame. Depth reprojection can also be useful for building shadow map HZBs. A notable advantage of depth reprojection is its ability to reduce the necessity for manual occluder selection. However, combining both depth reprojection and manual occluder selection can get even better results. An example use case of this technique can be seen in Assassin’s Creed Unity (Ubisoft 2014).

Reprojected depth buffer of background building, NPCs and ground geometry composited on top of the previously rendered depth prepass from Assassin’s Creed Unity, captured with NSight
Fully rendered frame from Assassin’s Creed Unity, captured with NSight

Despite its numerous advantages, depth reprojection is not without significant limitations. One major problem is precision. Since an HZB construction relies on a depth buffer approximation, the culling process may become non-conservative in some cases. This can lead to issues like visible popping, where objects may abruptly appear as objects move relative to the camera.

Two-Pass Solution

Combining insights from the two previously mentioned solutions, it should be possible to achieve the best of both worlds: the precision from depth prepass and the absence of data authoring from depth reprojection. The proposed approach shares similarities with depth reprojection in a sense of reusing data from a previous frame. However, instead of reprojection depth, only visible objects from the previous frame are “reprojected”. Essentially, this involves rendering these previously visible objects in some sort of a prepass at the beginning of a frame. These objects make excellent candidates for occluders in the current frame, assuming non-drastic camera movements. This prepass is going to be referred to as First Pass from now on. In comparison to the depth prepass, the first pass usually doesn’t store only depth, but rather some additional information, depending on which rendering pipeline is used, G-Buffers in the case of Deferred Rendering for example. After finishing the first pass, the remaining culling process should closely resemble the standard HZB occlusion culling. This technique is called Two-Pass Occlusion Culling and is currently getting more popular due to its appearance in Unreal Engine 5’s Nanite.

As the name suggests, two-pass occlusion culling involves dividing scene objects into two groups and then rendering each group in a single pass. Each pass consists of a singular compute shader dispatch, whose purpose is to fill indirect draw call arguments. Subsequently, an indirect draw call is executed, preferably by utilizing Multi Draw Count Indirect if supported by hardware. This technique fits the GPU-driven approach very well.

Two-Pass Occlusion Culling process visualized

First Pass

As previously stated, the first pass is only responsible for processing objects that were visible in the previous frame. To achieve this, a compute shader is dispatched with the same number of threads as the total number of objects in the scene. In addition, each thread can perform optional Frustum Culling and LOD Selection on objects that were previously visible, while skipping previously non-visible objects entirely. The result of this compute shader is stored in a GPU buffer as indirect draw call arguments.

After the indirectAfter the indirect draw call arguments are gathered, they are executed with an indirect draw call. To track the visibility of objects from the previous frame, another GPU buffer is used, known as the Visibility Buffer. Each element in this buffer corresponds to a single object in the scene, where 0 indicates that an object is not visible and 1 that an object is visible. The visibility buffer should be initialized with either 0s or 1s. Also, multiple visibility bits can be packed in a single buffer element.

...

// Read object's visibility from the previous frame
bool visible = visibilityBuffer[drawIndex];

// [Optional] Check if previously visible object
// is frustum culled in the current frame
if (visible)
{
bool frustumCulled = isFrustumCulled(...);
visible &&= !frustumCulled;
}

// Only object that was visible in the
// previous frame should be drawn in the first pass
bool shouldDraw = visible;

if (shouldDraw)
{
// [Optional] Select LOD
...

// Fill indirect draw call arguments
IndirectDrawArgs drawArgs;
...

drawArgs[drawArgsIndex] = drawArgs;
}

After the first pass is finished, an HZB can be generated from the resulting depth buffer. This is a completely conservative approach compared to approximated depth buffer from the reprojection technique.

Second Pass

In the second pass, once again, a compute shader is dispatched with the same number of threads as the total number of objects in the scene. However, this time, each thread performs occlusion culling in addition to potential frustum culling and LOD selection, regardless of whether the object was previously visible or not. The result of this dispatch is again a set of indirect draw call arguments representing objects found to be visible in this pass, but were not drawn in the first pass. These draw arguments are stored in a GPU buffer and later executed as an indirect draw call, similar to the first pass.

To prevent the redrawing of objects that were already drawn in the first pass, it is necessary to skip objects that had a visibility of 1 in the previous frame. Additionally, the visibility buffer should be updated for each object in preparation for the next frame, based on the results from frustum and occlusion culling, regardless if they are getting drawn in the second pass or not.

...

// [Optional] Check if object is frustum culled in the current frame
bool frustumCulled = isFrustumCulled(...);

bool visible = !frustumCulled;

// Check if object is occlusion culled in the current frame
if (visible)
{
bool occlusionCulled = isOcclusionCulled(...);
visible &&= !occlusionCulled;
}

// Only object that is visible in the current frame
// and was not drawn in the first pass should be drawn in the second pass
bool shouldDraw = visible && !visibilityBuffer[drawIndex];

if (shouldDraw)
{
// [Optional] Select LOD
...

// Fill indirect draw call arguments
IndirectDrawArgs drawArgs;
...

drawArgs[drawArgsIndex] = drawArgs;
}

// Fill visibility buffer for the next frame
visibilityBuffer[drawIndex] = visible;

Example

As an illustration, consider a scene featuring 5 static objects and a moving camera, observing the two-pass occlusion culling process for a single frame.

Example test scene with 5 enumerated objects and a camera

In the first pass, all 5 objects are getting processed. If an object had a visibility of 0 in the previous frame, it’s skipped. For objects that had a visibility of 1, optional frustum culling and LOD selection can be performed. Say that only 3 objects were visible in the previous frame, but since the camera is moving to the left, one of the objects is no longer visible due to it being outside the camera’s frustum. Assuming that the frustum culling is enabled, as a result only two objects that are still visible are rendered and the third one is skipped.

Previous frame where the object with an index of 1 is about to leave the camera view
Visibility buffer from a previous frame during the first pass, where green indicates objects that are drawn in the first pass, which are still visible in the current frame

In the second pass, all 5 objects in the scene are processed again regardless of their previous visibility. Then optional frustum culling, optional LOD selection and occlusion culling is performed on all 5 objects. Say that compared to the previous frame, one previously non-visible object enters the camera frustum and becomes visible in the current frame. This means that in this pass, only this additional object is drawn on top of the two already drawn objects from the first pass, resulting in a total of three rendered objects. Also the visibility buffer is appropriately updated for the next frame.

Current frame where the object with an index of 0 has entered the view
Updated visibility buffer in the second pass, where green indicates objects that are drawn in the second pass and not in the first pass

Conclusion

Two-pass occlusion culling is generally a very effective optimization technique, but it may show its limits in situations with very radical object movements relative to the camera. These types of movements are uncommon and are often seen in cutscene transitions. If they do occur, they may impact performance for a single frame after the cut was made. To address these issues, one potential solution would be to introduce an additional depth prepass.

This technique can also be used for meshlet and triangle occlusion culling, although triangle culling is usually not worth the effort. It can be used in Forward Rendering, Deferred Rendering, Deferred Materials, Visibility Buffering and more. For highly dense geometry, deferred materials and visibility buffering (not to be confused with the visibility buffer mentioned previously) are likely to work the best. Considering that the majority of modern game engines are adopting the GPU-driven approach, it is reasonable to assume that most of them will incorporate some form of the proposed occlusion culling solution, given its performance gains and simplicity. An example implementation of two-pass occlusion culling can be found on my GitHub page.

--

--