How To Populate Real-Time Worlds With Thousands Of Animated Characters

Making use of instanced static meshes in UE4 to render animated characters.

Published in

XRLO — eXtended Reality Lowdown

12 min readJan 27, 2021

Introduction

Recently, we were tasked with making a system that needed to support thousands of animated characters in real-time, while still allowing enough performance overhead for a scene with several dynamic lights, dense particle effects, and photoreal environments and characters.

Here’s an example of the final result — with 200,000 characters.

*200,000 characters in UE4, ~0.5ms frame time for the character system on NVIDIA 1080 at full HD*

The Problem

Anyone who has worked with large amounts of skeletally animated components will most likely have run into performance issues. At the time of writing, there is no native support for instancing skeletal meshes in UE4, so we incur at least one unique draw call cost for each character. On top of this, deforming skinned geometry on the CPU is inherently slower than just serving up a good old-fashioned static mesh.

As a result, when we came to create our system, the decision was made that we would render our characters as instanced static meshes and use the ever-popular technique of baking animation data to a texture and animating our characters in the vertex shader.

This decision presented some new problems, however. Traditionally, for every frame of animation data for a skeletal mesh, you bake each vertex position to a texture. This means that there is a direct relationship between an animation texture and a model’s topology.

A 1:1 example of a baked per-vertex animation texture from one of our character models. This example’s positional data is normalized for output as LDR for less texture memory consumption at the expense of less precision, but it is still explicitly tied to the source animation’s geometry

As we needed to support multiple character variants, this would mean that each variant, with its own topology, would require an individually baked vertex position texture for the same animation.

Even worse, each LOD (level of detail) would require its own unique animation texture. Ultimately, we ended up with 19 different animation loops and four unique characters — each with three LOD states — so if we were to use this approach, it would have required 228 unique vertex animation textures. That’s a lot of texture memory, especially when you consider the data needs to be in an uncompressed HDR format and that’s before we consider using additional textures for animation blending (we’ll come to this later!).

The Solution

It was clear that we needed a more lightweight and flexible solution that supported sharing animation data between characters with different topologies.

We had done something similar at Magnopus UK back in 2016, for an unreleased project where characters were made up of rigid, blocky joints. This allowed us to treat each joint as a rigid object, and therefore, instead of storing a position value per-vertex, we only needed to store a position and rotation value per joint.

*An example of some static mesh characters that are fully vertex animated.*

For any given animation, we wrote the component location and rotation of each joint per frame to a texture. Then, we sampled this texture in the vertex shader to displace the vertices, to resolve each pose per frame.

As a pre-process, we also separated the character into its individual limbs and zeroed out their transforms. This meant that the pivot for each limb was shared with the component’s, so in order to resolve the pose, we could just rotate each limb about the component’s pivot and then translate it into position.

Below is what the unperturbed character geometry looked like…

…and here is a breakdown of how the pose was resolved in the vertex shader. In practice, this is, of course, resolved instantaneously for each frame before being served up for rasterization.

By decoupling the animation data from the topology, we were able to retarget multiple meshes to the same rig and have them use the same animation data.

*Three characters with unique topologies targeting the same set of animation data.*

The other benefit was that the verticality of our texture was not defined by the number of vertices in our mesh, but by the number of joints in our rig (multiplied by two, as we needed an entry for both location and rotation). The joint count was almost certainly going to be much smaller than the vertex count. In fact, this rig only had 16 joints, making the animation texture 32 pixels tall instead of roughly 2000 pixels tall for the equivalent per-vertex bake.

The above animation as a vertex animation texture (1:1 scale). 32 pixels height (per joint location and rotation data) by 129 pixels (4.3 seconds of animation data at 30fps)

Unfortunately for us, the character system we were tasked to create wasn’t made up of block people, so we needed to adapt this system to support more naturalistically animated characters.

Constructing a static mesh for vertex animation

We were able to easily adapt this workflow to our needs by putting one constraint in place; supporting one joint influence per vertex for a skinned mesh.

It is more elegant to encode translations and rotations as deltas from a bind pose, and serialise the joint’s bind pose position in the UVs of the mesh, rather than write the animation data in component space and break apart the joints — as we did for the block people. This meant that our mesh bounding boxes were more representative of our character, and the unperturbed mesh was just a T pose.

We wrote a script that evaluated a skinned mesh and set all of its vertices to be influenced by a single joint. Then, for each vertex, the script queried the single influencing joint’s location in scene space, and remapped this location between 0 and 1, given a bounds scale. We settled on a bounds scale of ‘200’ as it was large enough to encompass the entire mesh comfortably, but not so large as to introduce precision errors.

We then wrote this position to the UV coordinates of two UV sets — for the purposes of this article, let’s call them ‘A’ and ‘B’. The X and Y scene positions were written to the U and V coordinates of the A UV set respectively, and the Z scene positions were written to the U coordinate of the B UV set.

*The script at work, packing the joint location data into the UVs.*

As a bonus, we also got a human-readable visualisation of the joint hierarchy inside the UV viewport!

The V component of the Z UV set is then reserved for placing the group of vertices in their row of data.

*The isolated V component of the ‘Z’ UV map, with an example vertex animation texture overlaid. Notice how each group of vertices index directly into the centre of each row of pixels.*

The above example shows the vertices indexing into the location data. To index into the rotation data, it was just a case of subtracting 0.5 from the V coordinate in the shader.

This automated Maya process can then be applied to any other character that shares the same rig, to allow it to use the same animation data.

*Four different character meshes that have been through the exact same process to encode joint positions into the UVs*

Once we set up the process for authoring a static mesh for vertex animation, we turned our attention to writing our animation data to textures.

Writing the animation loop textures

We generated all of the vertex animation textures as an offline editor process in UE4. We built a simple blueprint that could take a skeletal mesh and an array of animations and export a 16bit HDR for each animation.

Firstly, we cached all of the joint’s component space location and rotation data at the bind pose and spun up a temporary render target buffer, whose dimensions are determined by the number of joints in the skeleton (height) and the number of frames in the animation (width).

Secondly, we scrubbed the animation data by a given framerate (in our case, 30fps), and calculated the delta between each joint’s new component space location and rotation and the data we cached in the previous step.

We then wrote this data to the render target, where each row represented a joint, and each column represented a frame of animation. Rotation data was written to the top half of the texture and location data to the bottom. Writing this data to a single texture made swapping out animations trivial, as all animation data is stored in a single image.

An example of a per-joint animation texture. The top half is quaternion rotation (W in alpha) and the bottom half is location, all relative from the bind pose. The actual texture is stored in 16bit HDR format.

We wrote a new column of data for each frame of the animation. Once we had exhausted all of the frames, we exported the render target to disk. Finally, once all of our animations were baked, we were able to import them back into the editor to be consumed as standard UTexture2Ds.

Writing the vertex animation shader

Once we had all the data we needed to get our characters moving, we had to write the shader to unpack it all appropriately.

Anyone who has played around with some of Houdini’s vertex animation pipeline for UE4 or Unity may already be familiar with most of the concepts covered in this section.

Animation playback

We fed the animation position into the material via a scalar parameter. The AnimPosition was interpreted as seconds. We then scaled this value by the framerate (FPS) at which our animation was recorded, before adding a random offset for animation variation (more on this later!).

Finally, we scaled this value by the length of our animation, so that we could index into the correct frame for the given anim position for the texture.

*Shader graph for converting an AnimPosition (seconds) into a normalized U coordinate for looking up into an animation texture*

Reading the baked vertex anim data

Next, we needed to make a UV to index into the vertex animation texture in order to spit out the correct value for the vertex we were evaluating. We did this by making a Float2, with the X (or U) component being the anim position, and the Y (or V) component being the row our data was present in.

As mentioned, we stored both location and rotation data in a single texture. However, we needed to perform an individual texture look up for each before combining the result.

*Shader graph composing the anim position with the row position, to create UV coordinates to retrieve the data from the vertex animation texture*

Unpacking the joint pivots from the UV channels

Once we were querying the right animation data, we just needed to use this to transform our vertices. We had the quaternion rotation data from the texture, but we needed to retrieve the pivot data from which we needed to rotate our vertices about. We encoded this into the UVs of our mesh as part of our Maya automation process, so it was then just a matter of converting the normalized UV coordinates to the correct scale.

Note that Maya uses the Y-Up coordinate system, whereas UE4 uses Z-Up, which is why we swapped these two values in our MakeFloat3 node.

*Using Remap ValueRange to unpack the joint pivots from the UVs*

Next, we resolved the quaternion delta rotation about the pivot (transformed to local space) and combined this with the result of our translational delta.

*Performing a quaternion rotation about each joint’s pivot*

Finally, we needed to convert this from local to world space using the TransformVector node, and pass it into the World Position Offset pin on the result node, and — success!

Once we had a performant, animated character, we could use Unreal’s hierarchical instanced static mesh component to produce several thousand instances of them.

Variation

There’s nothing more unsettling than 200,000 people moving in perfect unison, so we were quick to add some variation.

We made extensive use of the per-instance random variable (made available to us in the shader by the instanced static mesh component) to make additional variations. This is a random value between 0 and 1 that is unique for each instance of a mesh. We used this value to interpolate between different scales for the characters, and also evaluated it to flip our characters horizontally for more variation in the same draw call.

Most importantly, we used this to offset our anim position value so that our characters started their animation loops at different times.

*One mesh draw call, but all different scales (including mirrored) and animation offsets*

This gave us quite a lot of mileage out of the single mesh draw call. At the expense of more draw calls, we peppered in some more character variants.

*Same as before, but four different character meshes — this means an additional draw call per mesh instance*

For additional variation, we used different animation loops per instance. We were paying the cost of a unique draw call per mesh instance, so why not serve up a different animation too?

*For roughly the same cost (a bit of additional texture memory), we can have unique animations per mesh instance as above*

Although this meant that all characters of a particular type would be playing the same animation, we found that for our use case, this was unnoticeable.

LODs (Level of details)

As our vertex animation textures were decoupled from the vertex count, it was trivial to add support for mesh LODs — we could simply provide a lower resolution mesh for the Maya automation process, and we had ourselves a LOD.

*Three mesh LODs, all referencing the same vertex animation texture*

For our lowest LOD, we considered adopting the tried-and-tested approach of camera-facing sprites. However, this presented a few problems:

The technique would not scale well to VR platforms, as the illusion would break when rendered stereoscopically.
We were not limited to a single direction from which we could view the characters, therefore we would need to adopt an ‘impostor’ billboard approach to have multiple viewing angles of each character.
The characters still needed to convey movement, which had to match those of the fully-realised characters, so we would need a bank of animation loops — all from multiple angles — in giant sprite sheets.
Dense groups of characters would result in a lot of translucency overdraw.

Instead, we made a ‘spline’ representation of a character in Maya that was only 70 triangles, which used the exact same animation data as the LOD 0 characters. We encoded a width value into the G channel of the vertex colours, and fed this into a simplified version of the ‘SplineThicken’ node, which allowed us to direct the faces of the geometry towards the camera. The effect was not without artefacts, but held up well at a distance, and, most importantly, blended seamlessly between LOD states without the need for any additional animation logic/data.

*An example of the lowest LOD for our characters. Notice how the faces are mostly aligned towards the camera vector.*

Future work / extending the system

We also implemented a blending system to allow for smooth transitions between loop states at runtime. As an offline process in UE4, a skeletal mesh performed a procedural animation blend between two animation states, and baked this out to a bespoke, one-shot blend texture. We then read this pre-baked texture at runtime when we wanted to switch from one animation loop to another.

The main issue with this approach was that it didn’t scale well. In fact, it scaled exponentially in terms of storage footprint, and with our 19 animation loops, we had to author 462 transition textures, adding up to just under 60Mb of data.

We could have mitigated this by baking the animation data at runtime to a dynamic render target. This would have allowed us to do more sophisticated animation blueprint driven blending logic at runtime, without requiring a bank of blend textures, at the cost of some runtime performance.

Another way of extending the system would be to introduce more variation per mesh instance by encoding morph targets into the vertex colors and/or normals of the geometries, similar to the way that the ‘Static Mesh Morph Targets’ system works. (https://docs.unrealengine.com/en-US/WorkingWithContent/Types/StaticMeshes/MorphTargets/index.html).

In conclusion, we were able to create a system that allowed us to extensively reuse animation data, which scaled well and ran at a low, mostly-fixed performance overhead.