How To Populate Real-Time Worlds With Thousands Of Animated Characters
Making use of instanced static meshes in UE4 to render animated characters.
Introduction
Recently, we were tasked with making a system that needed to support thousands of animated characters in real-time, while still allowing enough performance overhead for a scene with several dynamic lights, dense particle effects, and photoreal environments and characters.
Here’s an example of the final result — with 200,000 characters.
The Problem
Anyone who has worked with large amounts of skeletally animated components will most likely have run into performance issues. At the time of writing, there is no native support for instancing skeletal meshes in UE4, so we incur at least one unique draw call cost for each character. On top of this, deforming skinned geometry on the CPU is inherently slower than just serving up a good old-fashioned static mesh.
As a result, when we came to create our system, the decision was made that we would render our characters as instanced static meshes and use the ever-popular technique of baking animation data to a texture and animating our characters in the vertex shader.
This decision presented some new problems, however. Traditionally, for every frame of animation data for a skeletal mesh, you bake each vertex position to a texture. This means that there is a direct relationship between an animation texture and a model’s topology.
As we needed to support multiple character variants, this would mean that each variant, with its own topology, would require an individually baked vertex position texture for the same animation.
Even worse, each LOD (level of detail) would require its own unique animation texture. Ultimately, we ended up with 19 different animation loops and four unique characters — each with three LOD states — so if we were to use this approach, it would have required 228 unique vertex animation textures. That’s a lot of texture memory, especially when you consider the data needs to be in an uncompressed HDR format and that’s before we consider using additional textures for animation blending (we’ll come to this later!).
The Solution
It was clear that we needed a more lightweight and flexible solution that supported sharing animation data between characters with different topologies.
We had done something similar at Magnopus UK back in 2016, for an unreleased project where characters were made up of rigid, blocky joints. This allowed us to treat each joint as a rigid object, and therefore, instead of storing a position value per-vertex, we only needed to store a position and rotation value per joint.
For any given animation, we wrote the component location and rotation of each joint per frame to a texture. Then, we sampled this texture in the vertex shader to displace the vertices, to resolve each pose per frame.
As a pre-process, we also separated the character into its individual limbs and zeroed out their transforms. This meant that the pivot for each limb was shared with the component’s, so in order to resolve the pose, we could just rotate each limb about the component’s pivot and then translate it into position.
Below is what the unperturbed character geometry looked like…
…and here is a breakdown of how the pose was resolved in the vertex shader. In practice, this is, of course, resolved instantaneously for each frame before being served up for rasterization.
By decoupling the animation data from the topology, we were able to retarget multiple meshes to the same rig and have them use the same animation data.
The other benefit was that the verticality of our texture was not defined by the number of vertices in our mesh, but by the number of joints in our rig (multiplied by two, as we needed an entry for both location and rotation). The joint count was almost certainly going to be much smaller than the vertex count. In fact, this rig only had 16 joints, making the animation texture 32 pixels tall instead of roughly 2000 pixels tall for the equivalent per-vertex bake.
Unfortunately for us, the character system we were tasked to create wasn’t made up of block people, so we needed to adapt this system to support more naturalistically animated characters.
Constructing a static mesh for vertex animation
We were able to easily adapt this workflow to our needs by putting one constraint in place; supporting one joint influence per vertex for a skinned mesh.
It is more elegant to encode translations and rotations as deltas from a bind pose, and serialise the joint’s bind pose position in the UVs of the mesh, rather than write the animation data in component space and break apart the joints — as we did for the block people. This meant that our mesh bounding boxes were more representative of our character, and the unperturbed mesh was just a T pose.
We wrote a script that evaluated a skinned mesh and set all of its vertices to be influenced by a single joint. Then, for each vertex, the script queried the single influencing joint’s location in scene space, and remapped this location between 0 and 1, given a bounds scale. We settled on a bounds scale of ‘200’ as it was large enough to encompass the entire mesh comfortably, but not so large as to introduce precision errors.
We then wrote this position to the UV coordinates of two UV sets — for the purposes of this article, let’s call them ‘A’ and ‘B’. The X and Y scene positions were written to the U and V coordinates of the A UV set respectively, and the Z scene positions were written to the U coordinate of the B UV set.
As a bonus, we also got a human-readable visualisation of the joint hierarchy inside the UV viewport!
The V component of the Z UV set is then reserved for placing the group of vertices in their row of data.
The above example shows the vertices indexing into the location data. To index into the rotation data, it was just a case of subtracting 0.5 from the V coordinate in the shader.
This automated Maya process can then be applied to any other character that shares the same rig, to allow it to use the same animation data.
Once we set up the process for authoring a static mesh for vertex animation, we turned our attention to writing our animation data to textures.
Writing the animation loop textures
We generated all of the vertex animation textures as an offline editor process in UE4. We built a simple blueprint that could take a skeletal mesh and an array of animations and export a 16bit HDR for each animation.
Firstly, we cached all of the joint’s component space location and rotation data at the bind pose and spun up a temporary render target buffer, whose dimensions are determined by the number of joints in the skeleton (height) and the number of frames in the animation (width).
Secondly, we scrubbed the animation data by a given framerate (in our case, 30fps), and calculated the delta between each joint’s new component space location and rotation and the data we cached in the previous step.
We then wrote this data to the render target, where each row represented a joint, and each column represented a frame of animation. Rotation data was written to the top half of the texture and location data to the bottom. Writing this data to a single texture made swapping out animations trivial, as all animation data is stored in a single image.
We wrote a new column of data for each frame of the animation. Once we had exhausted all of the frames, we exported the render target to disk. Finally, once all of our animations were baked, we were able to import them back into the editor to be consumed as standard UTexture2Ds.
Writing the vertex animation shader
Once we had all the data we needed to get our characters moving, we had to write the shader to unpack it all appropriately.
Anyone who has played around with some of Houdini’s vertex animation pipeline for UE4 or Unity may already be familiar with most of the concepts covered in this section.
Animation playback
We fed the animation position into the material via a scalar parameter. The AnimPosition was interpreted as seconds. We then scaled this value by the framerate (FPS) at which our animation was recorded, before adding a random offset for animation variation (more on this later!).
Finally, we scaled this value by the length of our animation, so that we could index into the correct frame for the given anim position for the texture.
Reading the baked vertex anim data
Next, we needed to make a UV to index into the vertex animation texture in order to spit out the correct value for the vertex we were evaluating. We did this by making a Float2, with the X (or U) component being the anim position, and the Y (or V) component being the row our data was present in.
As mentioned, we stored both location and rotation data in a single texture. However, we needed to perform an individual texture look up for each before combining the result.
Unpacking the joint pivots from the UV channels
Once we were querying the right animation data, we just needed to use this to transform our vertices. We had the quaternion rotation data from the texture, but we needed to retrieve the pivot data from which we needed to rotate our vertices about. We encoded this into the UVs of our mesh as part of our Maya automation process, so it was then just a matter of converting the normalized UV coordinates to the correct scale.
Note that Maya uses the Y-Up coordinate system, whereas UE4 uses Z-Up, which is why we swapped these two values in our MakeFloat3 node.
Next, we resolved the quaternion delta rotation about the pivot (transformed to local space) and combined this with the result of our translational delta.
Finally, we needed to convert this from local to world space using the TransformVector node, and pass it into the World Position Offset pin on the result node, and — success!
Once we had a performant, animated character, we could use Unreal’s hierarchical instanced static mesh component to produce several thousand instances of them.
Variation
There’s nothing more unsettling than 200,000 people moving in perfect unison, so we were quick to add some variation.
We made extensive use of the per-instance random variable (made available to us in the shader by the instanced static mesh component) to make additional variations. This is a random value between 0 and 1 that is unique for each instance of a mesh. We used this value to interpolate between different scales for the characters, and also evaluated it to flip our characters horizontally for more variation in the same draw call.
Most importantly, we used this to offset our anim position value so that our characters started their animation loops at different times.
This gave us quite a lot of mileage out of the single mesh draw call. At the expense of more draw calls, we peppered in some more character variants.
For additional variation, we used different animation loops per instance. We were paying the cost of a unique draw call per mesh instance, so why not serve up a different animation too?
Although this meant that all characters of a particular type would be playing the same animation, we found that for our use case, this was unnoticeable.
LODs (Level of details)
As our vertex animation textures were decoupled from the vertex count, it was trivial to add support for mesh LODs — we could simply provide a lower resolution mesh for the Maya automation process, and we had ourselves a LOD.
For our lowest LOD, we considered adopting the tried-and-tested approach of camera-facing sprites. However, this presented a few problems:
- The technique would not scale well to VR platforms, as the illusion would break when rendered stereoscopically.
- We were not limited to a single direction from which we could view the characters, therefore we would need to adopt an ‘impostor’ billboard approach to have multiple viewing angles of each character.
- The characters still needed to convey movement, which had to match those of the fully-realised characters, so we would need a bank of animation loops — all from multiple angles — in giant sprite sheets.
- Dense groups of characters would result in a lot of translucency overdraw.
Instead, we made a ‘spline’ representation of a character in Maya that was only 70 triangles, which used the exact same animation data as the LOD 0 characters. We encoded a width value into the G channel of the vertex colours, and fed this into a simplified version of the ‘SplineThicken’ node, which allowed us to direct the faces of the geometry towards the camera. The effect was not without artefacts, but held up well at a distance, and, most importantly, blended seamlessly between LOD states without the need for any additional animation logic/data.
Future work / extending the system
We also implemented a blending system to allow for smooth transitions between loop states at runtime. As an offline process in UE4, a skeletal mesh performed a procedural animation blend between two animation states, and baked this out to a bespoke, one-shot blend texture. We then read this pre-baked texture at runtime when we wanted to switch from one animation loop to another.
The main issue with this approach was that it didn’t scale well. In fact, it scaled exponentially in terms of storage footprint, and with our 19 animation loops, we had to author 462 transition textures, adding up to just under 60Mb of data.
We could have mitigated this by baking the animation data at runtime to a dynamic render target. This would have allowed us to do more sophisticated animation blueprint driven blending logic at runtime, without requiring a bank of blend textures, at the cost of some runtime performance.
Another way of extending the system would be to introduce more variation per mesh instance by encoding morph targets into the vertex colors and/or normals of the geometries, similar to the way that the ‘Static Mesh Morph Targets’ system works. (https://docs.unrealengine.com/en-US/WorkingWithContent/Types/StaticMeshes/MorphTargets/index.html).
In conclusion, we were able to create a system that allowed us to extensively reuse animation data, which scaled well and ran at a low, mostly-fixed performance overhead.