How to implement a Fluid Simulation on the CPU with Unity (ECS/Job System)

9 min readDec 12, 2018

In summer 2018, while I was working at Atomic Raccoon (an Indie Game Studio in Paris), we were wondering how to run a simulation with many characters interacting with each others like a fluid.

Unity had released the Entity Component System (ECS), the Job System, and the Burst compiler a few months back. They advertised high speed, with an up to x100 performance boost.

I suggested that we could implement a fluid simulation with those new tools. This would allow us to familiarize with this new way of coding by prototyping a small physic engine. I knew some ways existed to run a fluid simulation on the GPU, so it’s already parallelizable.

I started looking in which physic simulation I could try to implement, and I found the Soft Particle Hydrodynamic (SPH) algorithms.

Before implementing it though, I looked at how Unity made their boid (or “fish”) demo, which shows multiple fishes interacting with each other. I watched multiple presentations on the ECS and the Job System and looked into this project to understand how it all works.

This post is more about how to port something to the ECS and Job system rather than how to actually implement SPH in Unity. Though, I talk about how I optimized some parts of the algorithm even further than parallelization. The source code is available here: https://github.com/leonardo-montes/Unity-ECS-Job-System-SPH.

Single-threaded implementation

First, I implemented SPH “the old way”. Using Monobehaviours. I followed this well-written tutorial from Big Theta Θ website (in C++, 2D and single-threaded). My implementation is very similar, but there some differences because I had parallelization and world interaction in mind.

First, I call InitSPH() to create the particles’ GameObject and initialize their properties (I set the position and a reference to the correct parameter to use for the simulation; this allows me to have one set of particles to be more viscous than another for example).

I then call every frame ComputeDensityPressure(), ComputeForces(), and Integrate(), just like in the blog post above-mentioned. It handles particle-to-particle collisions.

I also added two other methods: ComputeColliders() and ApplyPosition(). ComputeColliders() allows me to handle collisions with walls GameObject marked with the “SPHCollider” tag. The ApplyPosition() method is just there to set the particles GameObject position the same as the particles position.

And that’s it! Now, let’s switch to the ECS and the Job system!

Implementation with Unity’s ECS, Job System, and Burst Compiler

Now, we need to think differently. Instead of having one script initializing and updating the particles and their GameObject, we need multiple scripts.

Initialization

I created a SPHManager.cs script. It handles the initialization of new particles and walls.

In the single-threaded version, I could just get the GameObjects defined as Collider and run the collision solving algorithm. Here, I first need to turn them into entities to be processed by our systems. I put them into a NativeArray (don’t forget to call Dispose() after using it or it’ll create some errors). I then instantiate the entities using a GameObject prefab that serves as a model for all the components to use for each entity.

I loop through all the entities to set their values, based on the GameObjects defined as colliders.

The sphColliderPrefab is composed of two components: a GameObjectEntity and an SPHCollider.

My custom component holds the data I need to perform the wall collision later (it’s the same as the single-threaded one, I’m just using the new mathematic library with float3 instead of Vector3). The ComponentDataWrapper part allows the component to be added to the GameObject in the inspector.

Now, let’s do the same with the particles. I loop through them just to set their position.

On the other hand, the sphParticlePrefab is a bit more complicated.

Let’s break this down. There’s a PositionComponent so that I can see the position of the entity for rendering; a SPHVelocityComponent just to store the velocity of the particle; a SPHParticleComponent to store the properties of the entity; and a MeshInstanceRendererComponent, which is the same as a MeshFilter and a MeshRenderer (it allows the entity to be rendered by Unity).

We could only create this component to access a float3, like the Position component.

But the SPHParticleComponent isn’t a simple component, it’s a shared component!

The use of a shared component allows us to get entity chunks. All of these chunks have the same shared components properties. Instead of giving each particle a parameter id, we have to set shared component parameters to get two fluids with different properties for example. It’s a pretty nice feature!

We’re done with the initialization. Now, onto the system and jobs!

2500 particles being instanciated and rendered

The system and jobs

I start by getting my entities.

From the ECS documentation: “ComponentGroup lets you extract individual arrays of entities based on their components.” I got my components. I mark some as ReadOnly as I won’t write to them.

Let’s create the update method. It will run every frame, like the Update() method (we can use Time.deltaTime here for example). It’s a long method so I’ll fill it as we go. Here’s where we start from:

I start by getting the unique shared components (uniqueTypes is a list of SPHParticle). I then get the SPHCollider components into a ComponentDataArray (an array of components). After this, I loop through all unique sets of particles fluid (with different properties). This is based on the boid (or “fish”) demo by Unity. If you want to implement this and don’t need the different sets of properties, I don’t think you have to do it.

Now, let’s cache the data. I get the settings (fluid properties), the components, and all the values we’re going to iterate on: particlesPosition, particlesVelocity, particlesForces, etc. I put them in NativeArrays, with the allocator set to TempJob (it lasts for one job). You might wonder why we’re creating a Position NativeArray whereas there’s already a Position ComponentDataArray. It’s because, like the fact that we don’t set transform.position multiple times per frame, we just get it in the beginning, change the data, and then set it back to the component.

You’re already seeing some changes to the original single-threaded workflow. We’re improving performances by using a HashMap here (this is also inspired by/from the boid demo). I’ll talk more about it in a moment.

But first, I’m putting all of the NativeArray we created earlier in a struct, and then in a list. We do this so we can dispose of the old set of NativeArray (when we have multiple unique shared components).

Now, let’s fill the NativeArrays with something using jobs. As you’d expect, I fill particlesPosition and particlesVelocity with the components values. These are first jobs to get scheduled! To do that, I create the job with ParticlesPositionJob (it’s the name of the job’s struct) and I set the values. I then schedule the job. I set the particle count, the batch size and the JobHandle it depends on (here it depends on nothing so we just set inputDeps). You can learn more about this here (Unity says to keep the batch size small if the work done on the thread is expensive).

We also use MemsetNativeArray, this is used to initialize the values with a default one.

It’s time to schedule the more important jobs. We start with some optimization already! I schedule a job to put the particles position into a HashMap. You’ll notice that this time, the job doesn’t depend on inputDeps, but on particlesPositionJobHandle. This means that this job will wait for the ParticlesPositionJob to finish before starting. I need to do that otherwise I’ll access uninitialized data at the same time as it’s being filled.

CombineDependencies allows me to combine multiple JobHandles into one, so then I can execute a job depending on multiple previous jobs.

HashPositions and MergeParticles are both jobs from the Boid demo I think. I heavily modified the MergeParticles one to suit this project needs. This is a IJobNativeMultiHashMapMergedSharedKeyIndices job, it’s specific to HashMaps (it was really hard to find how it works, as it wasn’t documented anywhere but in one talk from Unite). The point of this job is to give each particle the id of the hashMap bucket it’s in.

Finally! I can schedule the jobs I need to solve the particle-to-particle collision. It’s quite simple, I just set the job’s data and schedule it. They all depend on the previous jobs.

I finish our job scheduling with the solving of wall collision and applying the particles position back to the components position. Then, the loop continues. Don’t forget to add uniqueTypes.Clear() before returning inputDeps after you exited the loop.

The jobs structs aren’t really different from the methods I was calling in the single-threaded workflow. When we schedule the job, Unity calls the Execute(index) method for each particle. It’s very similar to Compute Shaders. There are a few things to point out though. Don’t forget to add the [BurstCompile] in the beginning, it makes the job run up to 10x faster. I marked [ReadOnly] the values I only read from (there’s also a [WriteOnly] that exists in case you need it).

Finally, we need to add an OnStopRunning() method to get rid of the NativeArrays we created but didn’t dispose of (when we quit the scene for example).

And that’s it! It’s all working and we can look at how much better it runs!

Further optimizations

I’d like to explain how this is a bit better optimized than the single-threaded one. Instead of looping through all the particles to find colliding ones (O(n²)), we’re “only” checking the 26 neighbors around it plus the current one (because of the size of the particles, we can excpect a maximum of one particle per hash bucket.

We use this everywhere we used to check colliding particles.

Benchmark

Here’s a video showing the three different versions: single-threaded, and ECS/Job system/Burst compiler with and without hashing.

Conclusion

As you can see, the ECS version is a lot faster (as expected). Using hashing also improved performances significantly. In my next projects, I won’t be using only ECS. It’s still early and I find it very hard to prototype directly with it. The Job system is a lot easier to implement though (you don’t have to always use both at the same time!). That’s why there’s a single-threaded version. Now, I’m prototyping with monobehaviours, thinking about parallization; and then, when it’s all working, I port it to ECS and/or the Job system.

The hardest part to grasp for me was memory optimization. How to deal with NativeArrays, Components, etc. That’s why I detailed how everything was instantiated in the OnUpdate() method. I still don’t know much about how to create and remove entities efficiently while a game is running though (I’ll have to look into that).

I realized recently that the Burst Compiler is a big deal in terms of optimization to! It’s optimized “out of the box”, we just have to add the [BurstCompile] before a job struct and the script runs up to 10x faster. It’s a very easy trick to forget (I expect my jobs to run with it by default that’s why I forget about it and don’t understand why the performances aren’t that good).

This is my first post on this topic and I’d love to have your opinion on this. If you have any advice or feedback regarding this post, I’d love to hear! Thanks for reading!

You can download the Unity project here: https://github.com/leonardo-montes/Unity-ECS-Job-System-SPH