GPU Particle Research — Bevy Hanabi, Part 3

Sou1gh0st
11 min readMar 31, 2024

--

Introduction

In this article, we will talk about the batching mechanism of Bevy Hanabi, and before this, we need to figure out how does Bevy Hanabi manages our particle effects.

The Journey of a Particle Effect

We can create a particle effect by spawning a ParticleEffectBundle asset to the Bevy world:

fn setup(mut commands: Commands, mut effects: ResMut<Assets<EffectAsset>>) {
// ...
let effect = EffectAsset::new(
32768,
Spawner::burst(2500.0.into(), 2.0.into()),
writer.finish(),
)
.with_name("firework")
.init(init_pos)
.init(init_vel)
.init(init_age)
.init(init_lifetime)
.update(update_drag)
.update(update_accel)
.render(ColorOverLifetimeModifier {
gradient: color_gradient1,
})
.render(SizeOverLifetimeModifier {
gradient: size_gradient1,
screen_space_size: false,
});

let effect1 = effects.add(effect);

commands.spawn((
Name::new("firework"),
ParticleEffectBundle {
effect: ParticleEffect::new(effect1),
transform: Transform::IDENTITY,
..Default::default()
},
));
}

Firstly, we create a EffectAsset with attributes, then add it to the EffectAsset set and get a handle to it, finally we can create a ParticleEffectBundle with the effect handle. If you have any question about the EffectAsset set, you can refer to the doc of Bevy Asset Management.

The ParticleEffectBundle has a compiled_effect property to make the JIT Shader mechanism to work, and when we creating it, this property is initialized to default, here the asset property of CompiledParticleEffect will be initialized to a weak handle pointing to the DEFAULT_UUID, and it will change to the correct value after compilation.

#[derive(Bundle, Clone)]
pub struct ParticleEffectBundle {
/// The particle effect instance itself.
pub effect: ParticleEffect,
/// A compiled version of the particle effect, managed automatically.
///
/// You don't need to interact with this component, but it must be present
/// for the effect to work. This is split from the [`ParticleEffect`] itself
/// mainly for change detection reasons, as well as for semantic.
pub compiled_effect: CompiledParticleEffect,
/// Transform of the entity, representing the frame of reference for the
/// particle emission.
///
/// New particles are emitted relative to this transform, ignoring the
/// scale.
pub transform: Transform,
// ...

#[derive(Debug, Clone, Component)]
pub struct CompiledParticleEffect {
/// Weak handle to the underlying asset.
asset: Handle<EffectAsset>,
/// Cached simulation condition, to avoid having to query the asset each
/// time we need it.
simulation_condition: SimulationCondition,
/// Handle to the effect shader for his effect instance, if configured.
effect_shader: Option<EffectShader>,
/// Force field modifier values.
force_field: [ForceFieldSource; ForceFieldSource::MAX_SOURCES],
/// Main particle texture.
particle_texture: Option<Handle<Image>>,
/// 2D layer for the effect instance.
#[cfg(feature = "2d")]
z_layer_2d: FloatOrd,
/// Layout flags.
layout_flags: LayoutFlags,
}

impl<A: Asset> Default for Handle<A> {
fn default() -> Self {
Handle::Weak(AssetId::default())
}
}

The Bevy Schedules

In Bevy there are three schedules: Main, Extract and Render, and in a normal Bevy app, they will run repeatly in each frame. If you want to know more about it, you can refer to the document of Bevy Schedules. The Bevy Hanabi framework utilizes this mechanism to ensure correct particle management.

The compile_effects function is added to the Main Schedule’s PostUpdate stage:

.add_systems(
PostUpdate,
(
tick_spawners.in_set(EffectSystems::TickSpawners),
compile_effects.in_set(EffectSystems::CompileEffects),
update_properties_from_asset.in_set(EffectSystems::UpdatePropertiesFromAsset),
gather_removed_effects.in_set(EffectSystems::GatherRemovedEffects),
),
);

Then the extract_effects function is added to the Extract Schedule, it runs after the Main Schedule:

.edit_schedule(ExtractSchedule, |schedule| {
schedule.add_systems((extract_effects, extract_effect_events));
})

Finally in the Render Schedule, three functions are added to prepare the particle data for GPU:

  • prepare_effects: added to the Render’s PrepareAssets stage, it is used to prepare assets that have been created/modified/removed this frame.
  • queue_effects: added to the Render’s Queue stage, it is used to queue drawable entities as phase items in RenderPhases.
  • prepare_resources: added to the Render’s Prepare stage, it is used to prepare render resources from extracted data for the GPU based on their sorted order.
.configure_sets(
Render,
(
EffectSystems::PrepareEffectAssets.in_set(RenderSet::PrepareAssets),
EffectSystems::QueueEffects.in_set(RenderSet::Queue),
EffectSystems::PrepareEffectGpuResources.in_set(RenderSet::Prepare),
),
)
// ...
.add_systems(
Render,
(
prepare_effects.in_set(EffectSystems::PrepareEffectAssets),
queue_effects.in_set(EffectSystems::QueueEffects),
prepare_resources
.in_set(EffectSystems::PrepareEffectGpuResources)
.after(prepare_view_uniforms),
),
);

From the above we can see that the compile_effects function runs before the other ones, so the functions in the Extract and Render Schedule can fetch the compiled particle effects without check, now let’s examine them one by one.

Compile Effects

The ParticleEffectBundle contains all the components needed for the particle effect, when spawning, it will create a entity with the ParticleEffect and CompiledParticleEffect components. Just in case you forgot, let’s look at the code again here.

#[derive(Bundle, Clone)]
pub struct ParticleEffectBundle {
/// The particle effect instance itself.
pub effect: ParticleEffect,
/// A compiled version of the particle effect, managed automatically.
///
/// You don't need to interact with this component, but it must be present
/// for the effect to work. This is split from the [`ParticleEffect`] itself
/// mainly for change detection reasons, as well as for semantic.
pub compiled_effect: CompiledParticleEffect,
// ...
}

// create a entity with components listed in the ParticleEffectBundle
commands.spawn((
Name::new("firework"),
ParticleEffectBundle {
effect: ParticleEffect::new(effect1),
transform: Transform::IDENTITY,
..Default::default()
},
));

Then we can query them in the compile_effects function with the Bevy ECS Query mechanism, and check if it needs to rebuild:

fn compile_effects(
effects: Res<Assets<EffectAsset>>,
mut shaders: ResMut<Assets<Shader>>,
mut shader_cache: ResMut<ShaderCache>,
mut q_effects: Query<(Entity, Ref<ParticleEffect>, &mut CompiledParticleEffect)>,
) {
trace!("compile_effects");

// Loop over all existing effects to update them, including invisible ones
for (asset, entity, effect, mut compiled_effect) in
q_effects
.iter_mut()
.filter_map(|(entity, effect, compiled_effect)| {
// Check if asset is available, otherwise silently ignore as we can't check for
// changes, and conceptually it makes no sense to render a particle effect whose
// asset was unloaded.
let Some(asset) = effects.get(&effect.handle) else {
return None;
};

Some((asset, entity, effect, compiled_effect))
})
{
// If the ParticleEffect didn't change, and the compiled one is for the correct
// asset, then there's nothing to do.
let need_rebuild = effect.is_changed();
if !need_rebuild && (compiled_effect.asset == effect.handle) {
continue;
}

if need_rebuild {
debug!("Invalidating the compiled cache for effect on entity {:?} due to changes in the ParticleEffect component. If you see this message too much, then performance might be affected. Find why the change detection of the ParticleEffect is triggered.", entity);
}

#[cfg(feature = "2d")]
let z_layer_2d = effect
.z_layer_2d
.map_or(FloatOrd(asset.z_layer_2d), |z_layer_2d| {
FloatOrd(z_layer_2d)
});

compiled_effect.update(
need_rebuild,
#[cfg(feature = "2d")]
z_layer_2d,
effect.handle.clone_weak(),
asset,
&mut shaders,
&mut shader_cache,
);
}
}

After this, we can ensure all the particles have been compiled and ready for the subsequent processing.

Extract Effects

In the extract_effects function, we will add the new particle effects to extracted_effects.added_effects for later GPU allocation, we will also update the extracted_effects.effects list based on the occulision state for subsequent use.

pub(crate) fn extract_effects(
real_time: Extract<Res<Time<Real>>>,
virtual_time: Extract<Res<Time<Virtual>>>,
time: Extract<Res<Time<EffectSimulation>>>,
effects: Extract<Res<Assets<EffectAsset>>>,
_images: Extract<Res<Assets<Image>>>,
mut query: Extract<
ParamSet<(
// All existing ParticleEffect components
Query<(
Entity,
Option<&InheritedVisibility>,
Option<&ViewVisibility>,
&EffectSpawner,
&CompiledParticleEffect,
Option<Ref<EffectProperties>>,
&GlobalTransform,
)>,
// Newly added ParticleEffect components
Query<
(Entity, &CompiledParticleEffect),
(Added<CompiledParticleEffect>, With<GlobalTransform>),
>,
)>,
>,
mut removed_effects_event_reader: Extract<EventReader<RemovedEffectsEvent>>,
mut sim_params: ResMut<SimParams>,
mut extracted_effects: ResMut<ExtractedEffects>,
) {
// ...
// Collect added effects for later GPU data allocation
extracted_effects.added_effects = query
.p1()
.iter()
.map(|(entity, effect)| {
let handle = effect.asset.clone_weak();
let asset = effects.get(&effect.asset).unwrap();
let particle_layout = asset.particle_layout();
assert!(
particle_layout.size() > 0,
"Invalid empty particle layout for effect '{}' on entity {:?}. Did you forget to add some modifier to the asset?",
asset.name,
entity
);
let property_layout = asset.property_layout();

trace!("Found new effect: entity {:?} | capacity {} | particle_layout {:?} | property_layout {:?} | layout_flags {:?}", entity, asset.capacity(), particle_layout, property_layout, effect.layout_flags);
AddedEffect {
entity,
capacity: asset.capacity(),
particle_layout,
property_layout,
layout_flags: effect.layout_flags,
handle,
}
})
.collect();

// Loop over all existing effects to update them
extracted_effects.effects.clear();
for (
entity,
maybe_inherited_visibility,
maybe_view_visibility,
spawner,
effect,
maybe_properties,
transform,
) in query.p0().iter_mut()
{
// ...
extracted_effects.effects.insert(
entity,
ExtractedEffect {
handle: effect.asset.clone_weak(),
particle_layout: asset.particle_layout().clone(),
property_layout,
property_data,
spawn_count,
transform: transform.compute_matrix(),
// TODO - more efficient/correct way than inverse()?
inverse_transform: transform.compute_matrix().inverse(),
layout_flags,
image_handle,
effect_shader,
#[cfg(feature = "2d")]
z_sort_key_2d,
},
);
}
}

Prepare Effects

After the extract process, we need to allocate buffers for the added effects, and create batches for all active effects in this frame. The code in the prepare_effects is pretty long, so I will only show the important parts of it.

pub(crate) fn prepare_effects(
mut commands: Commands,
sim_params: Res<SimParams>,
render_device: Res<RenderDevice>,
render_queue: Res<RenderQueue>,
pipeline_cache: Res<PipelineCache>,
dispatch_indirect_pipeline: Res<DispatchIndirectPipeline>,
init_pipeline: Res<ParticlesInitPipeline>,
update_pipeline: Res<ParticlesUpdatePipeline>,
mut specialized_init_pipelines: ResMut<SpecializedComputePipelines<ParticlesInitPipeline>>,
mut specialized_update_pipelines: ResMut<SpecializedComputePipelines<ParticlesUpdatePipeline>>,
// update_pipeline: Res<ParticlesUpdatePipeline>, // TODO move update_pipeline.pipeline to
// EffectsMeta
mut effects_meta: ResMut<EffectsMeta>,
mut extracted_effects: ResMut<ExtractedEffects>,
mut effect_bind_groups: ResMut<EffectBindGroups>,
) {
// ...
// allocate the GPU buffers for the new added effects
effects_meta.add_remove_effects(
std::mem::take(&mut extracted_effects.added_effects),
removed_effect_entities,
&render_device,
&render_queue,
&mut effect_bind_groups,
);

// Build batcher inputs from extracted effects
let effects = std::mem::take(&mut extracted_effects.effects);
let mut effect_entity_list = effects
.into_iter()
.map(|(entity, extracted_effect)| {
let id = *effects_meta.entity_map.get(&entity).unwrap();
let property_buffer = effects_meta.effect_cache.get_property_buffer(id).cloned(); // clone handle for lifetime
let effect_slice = effects_meta.effect_cache.get_slice(id);

BatchInput {
handle: extracted_effect.handle,
entity_index: entity.index(),
effect_slice,
property_layout: extracted_effect.property_layout.clone(),
effect_shader: extracted_effect.effect_shader.clone(),
layout_flags: extracted_effect.layout_flags,
image_handle: extracted_effect.image_handle,
spawn_count: extracted_effect.spawn_count,
transform: extracted_effect.transform.into(),
inverse_transform: extracted_effect.inverse_transform.into(),
property_buffer,
property_data: extracted_effect.property_data,
#[cfg(feature = "2d")]
z_sort_key_2d: extracted_effect.z_sort_key_2d,
}
})
.collect::<Vec<_>>();
// ...
batcher.batch(effect_entity_list);
// ...
}

Here we can see each effect is mapped to a BatchInput and stored in the effect_entity_list, next we come to the focus of this article, the batching mechanism. We can only batch a sequence of items together if they can be dispatched with the same pipeline state, so we need to check all the configs relevant to the pipeline state, and here is the code:

impl Batchable<BatchState, EffectBatch> for BatchInput {
fn try_merge(self, state: &mut BatchState, batch: &mut EffectBatch) -> Result<(), BatchInput> {
// 2D effects need the same sort key; we never batch across sort keys because
// they represent the drawing order, so effects shouldn't be reordered past
// them.
#[cfg(feature = "2d")]
let is_2d_compatible = self.z_sort_key_2d == batch.z_sort_key_2d;
#[cfg(not(feature = "2d"))]
let is_2d_compatible = true;

let has_property_data = self.has_property_data();

let is_compatible = self.effect_slice.group_index == batch.buffer_index
&& self.effect_slice.slice.start == batch.slice.end // continuous
&& self.effect_slice.particle_layout == batch.particle_layout
&& self.property_layout == state.property_layout
&& self.effect_shader.init == state.init_shader
&& self.effect_shader.update == state.update_shader
&& self.effect_shader.render == batch.render_shader
&& self.layout_flags == batch.layout_flags
&& self.image_handle == batch.image_handle
&& is_2d_compatible
&& (!has_property_data || !state.has_property_data);

if !is_compatible {
return Err(self);
}

// Merge self into batch
batch.slice.end = self.effect_slice.slice.end;
batch.entities.push(self.entity_index);
state.has_property_data = has_property_data;
// TODO - add per-effect spawner stuffs etc. which are "batched" but remain
// per-effect

Ok(())
}
}

There are several properties to check for the comppatibility, let’s go through them one by one.

The Batching Mechanism

The first property which is checked in the batching process is effect_slice, we have not cover anything about it before, so let’s go through it first. From the above, we know that when adding a particle effect, it will be inserted to the extracted_effects.added_effects list, then we will invoke the add_remove_effects function to allocate GPU buffers for them, in this function we will get the id of our effect, which is used later for the slicing mechanism.

pub fn add_remove_effects(
&mut self,
mut added_effects: Vec<AddedEffect>,
removed_effect_entities: Vec<Entity>,
render_device: &RenderDevice,
render_queue: &RenderQueue,
effect_bind_groups: &mut ResMut<EffectBindGroups>,
) {
// ...
for added_effect in added_effects.drain(..) {
let cache_id = self.effect_cache.insert(
added_effect.handle,
added_effect.capacity,
&added_effect.particle_layout,
&added_effect.property_layout,
added_effect.layout_flags,
// update_pipeline.pipeline.clone(),
render_queue,
);

let entity = added_effect.entity;
self.entity_map.insert(entity, cache_id);
// ...
}

Here is the effect_cache.insert will try to find a effect buffer which is compatible with the effect layout, if it is, try to allocate a slice into the buffer, otherwise we must allocate a new one.

pub fn insert(
&mut self,
asset: Handle<EffectAsset>,
capacity: u32,
particle_layout: &ParticleLayout,
property_layout: &PropertyLayout,
layout_flags: LayoutFlags,
// pipeline: ComputePipeline,
_queue: &RenderQueue,
) -> EffectCacheId {
let (buffer_index, slice) = self
.buffers
.iter_mut()
.enumerate()
.find_map(|(buffer_index, buffer)| {
if let Some(buffer) = buffer {
// The buffer must be compatible with the effect layout, to allow the update pass
// to update all particles at once from all compatible effects in a single dispatch.
if !buffer.is_compatible(&asset) {
return None;
}

// Try to allocate a slice into the buffer
buffer
.allocate_slice(capacity, particle_layout)
.map(|slice| (buffer_index, slice))
} else {
None
}
})
.or_else(|| {
// Cannot find any suitable buffer; allocate a new one
let buffer_index = self.buffers.iter().position(|buf| buf.is_none()).unwrap_or(self.buffers.len());
let byte_size = capacity.checked_mul(particle_layout.min_binding_size().get() as u32).unwrap_or_else(|| panic!(
"Effect size overflow: capacity={} particle_layout={:?} item_size={}",
capacity, particle_layout, particle_layout.min_binding_size().get()
));
trace!(
"Creating new effect buffer #{} for effect {:?} (capacity={}, particle_layout={:?} item_size={}, byte_size={})",
buffer_index,
asset,
capacity,
particle_layout,
particle_layout.min_binding_size().get(),
byte_size
);
let mut buffer = EffectBuffer::new(
asset,
capacity,
particle_layout.clone(),
property_layout.clone(),
layout_flags,
//pipeline,
&self.device,
Some(&format!("hanabi:buffer:effect{buffer_index}_particles")),
);
let slice_ref = buffer.allocate_slice(capacity, particle_layout).unwrap();
if buffer_index >= self.buffers.len() {
self.buffers.push(Some(buffer));
} else {
debug_assert!(self.buffers[buffer_index].is_none());
self.buffers[buffer_index] = Some(buffer);
}
Some((buffer_index, slice_ref))
})
.unwrap();
let id = EffectCacheId::new();
trace!(
"Insert effect id={:?} buffer_index={} slice={:?}x{}B particle_layout={:?}",
id,
buffer_index,
slice.range,
slice.particle_layout.min_binding_size().get(),
slice.particle_layout,
);
self.effects.insert(id, (buffer_index, slice));
id
}

The effect buffer stores all the buffers needed fot particle simulation and rendering, so if the new effect can be stored in the existing buffer, we can allocate a new slice of it and handle them in a single dispatch. However, the current compatiblity check is a little buggy and can be improved by checking the particle layout:

pub fn is_compatible(&self, handle: &Handle<EffectAsset>) -> bool {
// TODO - replace with check particle layout is compatible to allow tighter
// packing in less buffers, and update in the less dispatch calls
*handle == self.asset
}

From the above, we know that if some of the effects use the same EffectAsset, then then will share the same effect buffers, but the buffer binding is just one of the prerequisites for a single dispatch. Let’s go back to the final compatiblity check code in the batching process:

let is_compatible = self.effect_slice.group_index == batch.buffer_index
&& self.effect_slice.slice.start == batch.slice.end // continuous
&& self.effect_slice.particle_layout == batch.particle_layout
&& self.property_layout == state.property_layout
&& self.effect_shader.init == state.init_shader
&& self.effect_shader.update == state.update_shader
&& self.effect_shader.render == batch.render_shader
&& self.layout_flags == batch.layout_flags
&& self.image_handle == batch.image_handle
&& is_2d_compatible
&& (!has_property_data || !state.has_property_data);

When two adjacent effects share the same effect buffers, their slices will be continuous, then the condition self.effect_slice.slice.start == batch.slice.end will be satisfied, so does the first one self.effect_slice.group_index == batch.buffer_index because they are sharing the same effect buffer group (the group_index in a slice is equal to the buffer_index):

pub fn get_slice(&self, id: EffectCacheId) -> EffectSlice {
self.effects
.get(&id)
.map(|(buffer_index, slice_ref)| EffectSlice {
slice: slice_ref.range.clone(),
group_index: *buffer_index as u32,
particle_layout: slice_ref.particle_layout.clone(),
})
.unwrap()
}

Then, the rest of the conditions are easier to understand, they must have the same paricle layout and property layout, the same shader, and the same texture binding, or we cannot dispatch them in a single pass. If current effect pass the compatibility check, it will be merged into the current batch:

// Merge self into batch
batch.slice.end = self.effect_slice.slice.end;
batch.entities.push(self.entity_index);
state.has_property_data = has_property_data;

Now we can go through the batch function, it will create a new batch, and try to merge the adjacent effects into it, and if it fails, create a new one:

pub fn batch(&mut self, items: impl IntoIterator<Item = I>) {
// Loop over items in order, trying to merge them into the current batch (if
// any)
let mut current: Option<(S, B)> = None;
for item in items.into_iter() {
if let Some((mut state, mut batch)) = current {
match item.try_merge(&mut state, &mut batch) {
Ok(_) => current = Some((state, batch)),
Err(item) => {
// Emit current batch, which is now completed since the new item cannot be
// merged.
self.do_emit(batch);

// Create a new batch from the incompatible item. That batch becomes the new
// current batch.
current = Some(self.do_into_batch(item));
}
}
} else {
// First item, create a new batch
current = Some(self.do_into_batch(item));
}
}

// Emit the last batch if any
if let Some((_, batch)) = current {
self.do_emit(batch);
}
}

The do_emit will invoke a closure defined in the prepare_effects function to spawn a entity with the merged EffectBatch to the Bevy world for later use.

emit: |batch: EffectBatch| {
// assert_ne!(asset, Handle::<EffectAsset>::default());
assert!(batch.particle_layout.size() > 0);
trace!(
"Emit batch: buffer #{} | spawner_base {} | spawn_count {} | slice {:?} | particle_layout {:?} | render_shader {:?} | entities {}",
batch.buffer_index,
batch.spawner_base,
batch.spawn_count,
batch.slice,
batch.particle_layout,
batch.render_shader,
batch.entities.len(),
);
commands.spawn(batch);
num_emitted += 1;
}

Then in the queue_effects function which is next to the prepare_effects will emit draw calls based on the count of effect batches.

What’s Next

We’ll discuss the simulation and rendering feaures of Bevy Hanabi in the upcoming articles, thanks for reading and have a nice day!

--

--