GPU Particle Research — Bevy Hanabi, Part 2

4 min readMar 22, 2024

Introduction

In our previous article, we explored the fundamental concepts and key features of Havy Hanabi that establish it as a high-performance GPU Particle Simulator. Building on that foundation, we will discuss the pipeline design of Bevy Hanabi in more detail.

A Bird’s-Eye View of the Compute Pipeline

We can use RenderDoc to capture a frame from Bevy, then we will get the following result:

From here we can see there are three computer passes, they are Init, Indirect and Update passes which we have discuessed in the previous article. Let’s revisit them with the RenderDoc.

Init Pass

We can find that the dispatch group configuration of Init Pass is (40, 1, 1), for our fireworok example, the spawner will burst 2500 particles in every 2 seconds:

let effect = EffectAsset::new(
    32768,
    Spawner::burst(2500.0.into(), 2.0.into()),
    writer.finish(),
)
.with_name("firework")
// ...

And each work group has 64 threads, so we calculate the work group count based on the spawn count of each frame, and in the first frame, the spawner will burst 2500 particles, so the spawn count is 2500, so the work group count of Init Pass should be 40:

const WORKGROUP_SIZE: u32 = 64;
let workgroup_count = (spawn_count + WORKGROUP_SIZE - 1) / WORKGROUP_SIZE;
// (2500 + 64 - 1) / 64 = 40

The Init Pass uses 4 buffers:

particle_buffer: stores the properties of particles.
indirect_buffer: stores the indices of alive and dead areas.
spawner: store the transform of the spawner, spawn count and random seed, etc.
render_indirect: stores the data using across the whole compute pipeline.

And we can verify the conclusion of the previous article that the dead area was initialized with indices in the reverse order.

Indirect Pass

The work group count of indirect pass is based on the batch count of particles, and the batch is something like instancing in GPU, the Bevy Hanabi will try to batch all the particle effects together to reduce draw calls. Now you just need to know that the batch count is related to the count of particle effects in the bevy world, and we will cover the detail of the batching in the upcoming articles. In our firework example, the batch count is set to 1, which means the work group count should also be 1. This can be confirmed by referring to the first picture provided above.

Building on the batch and work group configuration, we now examine the buffers used by the Indirect Pass, which uses 3 buffers:

render_indirect_buffer: the same as the render_indirect in the Init Pass, which stores the data using across the whole compute pipeline.
dispatch_indirect_buffer: the buffer is used to store the work group count for the Updata Pass, that is the work group count of Update Pass is read from GPU via the dispatch_workgroups_indirect call.
spawner_buffer: the buffer to store all the Spawner data, we have mentioned it in the Init Pass, which is named spawner at the group 2 and binding 0. The difference is in the Init Pass the spawn buffer is a piece of the spawner_buffer with offset to the specific spawner, but the one in the Update Pass is the entire buffer.

Update Pass

The work group count of the update pass is calculated by the Indirect Pass and stored in the dispatch_indirect_buffer. To load the work group count from GPU we will invoke the dispatch_workgroups_indirect method with buffer and offset.

if let Some(buffer) = effects_meta.dispatch_indirect_buffer.buffer() {
    trace!(
        "dispatch_workgroups_indirect: buffer={:?} offset={}",
        buffer,
        dispatch_indirect_offset
    );
    compute_pass.dispatch_workgroups_indirect(buffer, dispatch_indirect_offset);
    // TODO - offset
}

From the previous article, we learned that the Update Pass is responsible for processing particle updates, and it uses the following buffers:

particle_buffer: the same as the one in the Init Pass.
indirect_buffer: the same as the one in the Init Pass.
render_indirect: the same as the one in the Init and Update Pass.

The Data Flow of the Compute Pipeline

Now we have learned the buffer usage of Bevy Hanabi’s computer pipeline, for the sake of avoiding confusion, let’s visualize the data flow with images.

What’s Next

We’ll discuss the batching mechanism of Bevy Hanabi in the upcoming articles, thanks for reading and have a nice day!