Creating a custom animation system for Unity

Blaž Tomažič
Outfit7
Published in
16 min readJul 10, 2023

TL;DR: Making a custom animation system to reduce the size of assets by 50% and solving all the problems that come with it.

I’m Blaž Tomažič, a Senior Software Engineer at Outfit7. I’ve been developing with the company for five years, working mostly on low-level parts of game development. I just love controlling the hardware and trying to get the most out of it. This is just one of the reasons why I was interested in decreasing our app size.

In this article, I’ll touch upon the problem of app size in the mobile space and how the animations are one of primary causes. Next, I’ll demonstrate the implementation of a custom animation system that can replace Unity’s but has a smaller asset size footprint. Finally, I’ll go over the results and evaluate our solution.

The app size problem

With each new game — or even existing game updates — we’re trying to deliver better quality content; meaning nicer graphics, smoother animation, crisper sound, more assets, etc. But with each of these elements, app size increases. And the problem is that larger app size leads to a smaller install rate.

We aim to make sure our apps are less than 150MB, so that they can be downloaded in one go. Therefore, we’re always trying to reduce app size however we can.

One of the opportunities to do that comes with the animation system. We discovered that by replacing the Unity animation system with a custom one we could lower the size of animation assets by as much as 50%.

Base measurement

For our testing ground, we’ll take the My Talking Angela game. For comparison, let’s throw in the sequel as well:

We can see that animations account for upwards of 15% of the final app size. Newer games use even more space for animations because there are more of them and they are also more detailed.

Animations: A quick refresher

Before delving into how exactly to replace the Unity animation system, let’s first go through a small refresher on how animations work. (Feel free to skip this section if you’re already familiar.)

Objects in games are normally animated using skeletons and animations.

A skeleton is a hierarchy of joints (or bones) that represent the animated object. For a humanoid character that would be shoulder, elbow, wrist, and so on.

An animation is a collection of animation tracks. Each animation track represents a change of one joint over time relative to the parent joint. For example, the position of the wrist relative to the shoulder at times 0, 1, 2, etc.

Game engines use the skeleton to find which transforms are affected and the animations to sample values it should apply on those transforms.

Modified transforms later affect the vertices of the rendered mesh (skin) so that the displayed object appears animated.

For a more in-depth explanation of skeletal animation, check out the wiki.

Ozz animation system

Writing our own animation system from scratch would take too much time, so let’s use the tried and tested Ozz animation system. (We used it in our own proprietary game engine and it turned out great.) It works on all mobile (Android, iOS) and desktop (Windows, MacOS, Linux) platforms, and has an open-source license.

For greater speed, it uses linear interpolation of animation data and for lower runtime/package size it stores animation data using half-precision floats.

Import Ozz into C#

Ozz is compiled as a native library, so we need to dynamically link it from C#. But, instead of linking the whole Ozz library (libozz_animation.so) directly, we’ll create a wrapper library (libozz_animation_bindings.so). The wrapper will expose only the functionality that we need. Doing this will lead to a smaller API/ABI surface — and a smaller library size.

In the C++ wrapper library we expose the functions (that mostly just forward to Ozz library) like so:

#define EXPORT __attribute__((visibility("default")))

extern "C" {
EXPORT ozz::animation::Animation* ozz_animation_new() {
return new ozz::animation::Animation();
}

EXPORT void ozz_animation_free(ozz::animation::Animation* animation) {
delete animation;
}
}

And in C# we import them:

[DllImport("ozz_animation_bindings", EntryPoint = "ozz_animation_new")]
static extern System.IntPtr OzzAnimationNew();

[DllImport("ozz_animation_bindings", EntryPoint = "ozz_animation_free")]
static extern void OzzAnimationFree(System.IntPtr animation);

Next, we link the shared wrapper library (libozz_animation_bindings.so) with the static build of Ozz library (libozz_animation.a) by using CMake:

set(BUILD_SHARED_LIBS OFF CACHE BOOL "" FORCE)
add_subdirectory("subprojects/ozz-animation" "ozz_animation" EXCLUDE_FROM_ALL)

add_library(ozz_animation_bindings SHARED src/ozz_bindings.h)
target_link_libraries(ozz_animation_bindings PRIVATE ozz_animation)

add_link_options(-Wl, - gc-sections)
add_link_options(-Wl, - exclude-libs,ALL)

With the last step, the linker automatically removes unused code for us! This results in a 0.4MB library, instead of the 0.9MB we would have without the wrapper.

To be really sure that unused code is removed, we can check the function symbols defined in the produced library:

$ llvm-nm -DC libozz_animation_bindings.so
0000000000002310 T ozz_animation_free
00000000000022e0 T ozz_animation_new
…Only a handful of other symbols…

Wrap Ozz classes

To use the bindings safely from C#, we wrap them in C# classes. We do this to uphold memory safety and make sure all the calls into the library have valid inputs and outputs. All the code should use these classes and not the imported functions directly.

To not reinvent the Ozz API we just copy the Ozz C++ methods into C#. For example the OzzAnimation class on C# side looks like this:

public class OzzAnimation : IDisposable {
private IntPtr NativePtr;

public OzzAnimation() {
NativePtr = Bindings.OzzAnimationNew(); // 👈 (1)
}

~OzzAnimation() {
Dispose();
}

public void Dispose() {
if (NativePtr != IntPtr.Zero) {
Bindings.OzzAnimationFree(NativePtr); // 👈 (2)
NativePtr = IntPtr.Zero;
}
}

// …other Ozz functions…
}

(1) and (2) are function calls into the native library.

Expose Ozz to Unity

Our work up to this point is not Unity-specific and holds for any C#/C++ project. For the Ozz library to be usable in Unity we still need to create Unity-specific classes.

Animation data

First, we create two classes that will hold our animation data. One will represent the skeleton for binding of animation data to transforms in a scene. The other will contain the actual animation data.

OzzSkeleton is a Unity-only class. It doesn’t depend on Ozz library and is our custom representation of the skeleton:

[Serializable]
public struct OzzSkeletonBinding {
[SerializeField]
public string path;
[SerializeField]
public string componentType;
[SerializeField]
public string propertyName;
}

public class OzzSkeleton : ScriptableObject {
[SerializeField]
private OzzSkeletonBinding[] Bindings;
}

Looking closer, we can see that this class is just a collection of bindings (or joints). Each binding represents a path from the root of the animated game object hierarchy to the affected game object. By default, a transform is animated, but if the componentType and propertyName fields are present, the matching property on the component is animated instead.

OzzAnimationClip is a representation of animation data. This one does use the Ozz library class we created before (OzzAnimation):

public class OzzAnimationClip : ScriptableObject {
[SerializeField]
private byte[] data; // 👈 (1)
private OzzAnimation Animation; // 👈 (2)
[SerializeField]
private OzzSkeleton Skeleton; // 👈 (3)

private void OnEnable() {
Animation = new OzzAnimation();
Animation.Load(data); // 👈 (4)
#if !UNITY_EDITOR
data = null;
#endif
}

private void OnDestroy() {
Animation.Dispose();
}
}

The data field (1) holds the Ozz animation asset data that is converted into runtime data (2) at runtime (4). Each animation clip is linked to a skeleton (3) in order to apply this animation correctly.

The skeletons are deduplicated and shared between animation clips. This way, we have only one OzzSkeleton asset for all the animation clips with the same set of bindings.

Populate animation data

Now that we have our data classes, we need some way of populating them. We could generate them directly from .fbx files, but let’s create them directly from Unity animation clips instead:

OzzSkeleton ozzSkeleton;
OzzAnimationClip ozzClip;

foreach (var binding in AnimationUtility.GetCurveBindings(unityClip)) { // 👈 (1)
var curve = AnimationUtility.GetEditorCurve(unityClip, binding);

// Fill Skeleton bindings 👈 (2)
int track = ozzSkeleton.AddBinding(binding.Path, binding.Type, binding.PropertyName);

// Sample animation clip data 👈 (3)
for (float time = 0.0f; time <= unityClip.Length; time += 1.0f/unityClip.frameRate) {
switch (binding.PropertyName) {
case "m_LocalPosition.x":
data[track][time].position.x = curve.Evaluate(time);
break;
// ...similar for `m_LocalRotation.{x,y,z}` and `m_LocalScale.{x,y,z}`...
}
}
}

ozzClip.data = Ozz.Convert(data); // 👈 (4)
ozzClip.Skeleton = ozzSkeleton;

With the Unity animation utility, we get the bindings for the animation (1). We add each binding into our own skeleton (2) and sample the animation data for each binding into some custom structure (3). We then use that custom structure to fill the structures that Ozz library requires to build compressed animation data (4).

An explanation of how to convert custom data to an Ozz animation data [what Ozz.Convert does here at (4)] can be found in their docs and samples.

The Ozz.* calls in code snippets are calls into the Ozz library with sometimes some preprocessing involved.

Before using the code above, we just need to make sure that Unity animations don’t have compression enabled, otherwise Ozz will try to compress an already-compressed animation, which can lead to compression artifacts or even reduced compression overall.

Animation behavior

To be able to use the data classes, we need to create a class that will apply that data. This will be the OzzAnimator:

Its C# API mimics the one of UnityEngine.Animation so that the conversion from existing Unity projects is easier and faster.

OzzAnimator exposes a list of animation states (OzzAnimationState class) that allow the dveloper to control how and which animations to play. The properties that can be set on an animation state can be seen on the screenshot above.

Based on the configuration of those states, if enabled, OzzAnimator plays assigned animation clips on the game object hierarchy it belongs to. It also updates the animation states while playing, for example, the Time property.

The high-level overview of how we animate inside the OzzAnimator looks something like this:

// Data
OzzSkeleton Skeleton;
List<OzzAnimationState> States;
List<UnityEngine.Transform> Transforms;

// Once
foreach (var binding in Skeleton.Bindings) {
Transforms.Add( transform.Find(binding.path) ); // 👈 (1)
}

// Update
foreach (var state in States) {
samples.Add( Ozz.Sample(state) ); // 👈 (2)
}
var final = Ozz.Mix(samples, ...) // 👈 (3)
for (int i = 0; i < Skeleton.Bindings.Length; i++) {
Transforms[i].localPosition = final[i].position; // 👈 (4)
// ...same for rotation and scale...
}

We have a skeleton, which is used to find the transforms (1) that will be animated. We sample animations assigned to animation states on each frame by calling into the Ozz library (2). When we collect all the samples, we mix/blend them together into a final sample (3). The final sample values can then be applied directly on the animated transforms (4).

Problems & solutions

That should be all… right? We can replace the Unity animation system with our own and call it a day!

But keen readers probably noticed that the implementation still has a few problems. We should probably take care of a few of them.

Animating properties

Our custom animator is currently only able to animate transforms. To be on par with Unity animation system we should be able to also animate object properties (for example, changing MonoBehaviour.enabled).

The properties when exported from Unity animations to our custom skeleton representation look like this:

The main problem here is that property names refer to internal Unity C/C++ properties and not the C# properties that we can modify from our code with C# reflection. Luckily, the Unity C# animation job system exposes those properties for us via IAnimationJob.

We modify our animator to apply the transforms and properties inside an animation job:

// Once
foreach (var binding in Skeleton.Bindings) {
var transform = transform.Find(binding.path);
PropertyHandles.Add(
Animator.BindStreamProperty(transform, binding.type, binding.propertyName) // 👈 (1)
);
}

// Update
var final = Ozz.Mix(samples, ...);
for (int i = 0; i < PropertyHandles.Length(); i++) {
PropertyValues[i] = final[i].value; // 👈 (2)
}

// IAnimationJob update // 👈 (3)
void ProcessAnimation(AnimationStream stream) {
for (int i = 0; i < PropertyHandles.Length(); i++) {
PropertyHandles[i].SetFloat(stream, PropertyValues[i]); // 👈 (4)
}
}

We use the UnityEngine.Animator for running the animation job. For transforms/properties to be available inside the animation job, we bind them with the animator (1). On each frame, we sample and mix the animations as before but instead of applying the changes directly to objects, we save them to a NativeArray (2) (a native array would also work). That array is passed to an animation job where the values are applied to bound transforms/properties (4) through an AnimationStream provided by IAnimationJob (3). The stream is able to modify all the bound transform/properties from (1).

Blend and additive animations

Animation can be applied to skeletons in two ways, blend or additive. Blend operation mixes the animation with other blending animations, while additive operation adds on top after all the blending animations have already been mixed.

For example, let animation A animate some property to 4 and animation B to 6. Blending A and B sets that property to 5 = (4+6)/2. Adding B on top of A sets it to 10 = 4+6.

If we want to be able to apply the same animation in both ways, we need to actually have different animation data for each operation. This is because blend animation data can’t be applied as additive and vice-versa. (Well, technically it can be, but the result is not what you’d expect.)

Having two animations for each animation asset doubles our asset size. To solve this problem we will calculate the additive animation from a blending one at runtime and only have the blending animation in the asset.

If we create additive animation data from a blend one at build time, we “subtract” the first frame from all the other frames:

additivePosition[i] = blendPosition[i] - blendPosition[0];
additiveRotation[i] = blendRotation[0].conjugate() * blendRotation[i];
additiveScale[i] = blendScale[i] / blendScale[0];

This way, the produced additive animation represents only what is added on top of the first (reference) frame of the blend animation.

We can instead do the same at runtime:

// Once
state.Clip.FirstFrameInverse = Ozz.SampleInverseFirstFrame(state.Clip); // 👈 (1)

// Update
var sample = Ozz.Sample(state);
sample = Ozz.Additive(state.Clip.FirstFrameInverse, sample); // 👈 (2)

For each clip, we sample the first frame and invert it (1). In the update, we sample the animation as before, but after that we additively apply the sample on top of the first frame inverse sample to get the additive animation sample (2).

How does this actually work? Inverse is a “negative” operation that undoes the original operation. For example, for 3D position (0, 0, 3) the inverse is (0, 0, -3). So, if the position at the first frame is (0, 0, 3) and at frame n is (0, 0, 4), we calculate the additive position for frame n like this:

(0, 0, -3) + (0, 0, 4) = (0, 0, 1)

It works similarly for rotation and scale, but instead of negating the value to produce the inverse, we use a quaternion conjugate operation and reciprocal respectively. For rotation in euler at first frame (0, 0, 30) and frame n (0, 0, 70), and scale at first frame (2, 2, 2) and frame n (6, 6, 6), we get:

(0, 0, -30) “+” (0, 0, 70) = (0, 0, 40)

(1/2, 1/2, 1/2) * (6, 6, 6) = (3, 3, 3)

Note that with this runtime solution, we trade runtime performance for storage/memory usage, so this may not be the right tradeoff for everyone.

Skeleton mixing

You’ve probably noticed that the custom animator only has a single skeleton reference. So, what happens if we want to mix animations with different skeletons? Currently, it won’t work.

To be able to animate the different bindings from all the playing animations, we need to merge all the different skeletons into a skeleton superset (or merged skeleton) at runtime. This is a skeleton that has all the bindings from all the playing animations.

Once we use that skeleton, we need to take special care when sampling and blending the animations as we need to properly remap the samples from the animation skeleton to the new merged skeleton. We’ll skip the code here and just hand-wave the implementation:

When an Ozz animation is sampled, it returns the sample in an array that corresponds to that of its skeleton. We create another array corresponding to the merged skeleton. We copy the sampled values to it at the indices that have matching bindings. The not-filled indices can be filled with identity values [(0, 0, 0) position/rotation and (1, 1, 1) scale].

After filling the sample array for the merged skeleton, we also need to create a matching weights array. This array will mask the fields that were not sampled. We pass this array to the mixing operation so that it only uses the sampled values and ignores the rest.

Looking at the above picture, for animation with skeleton joints [a, d] we would get a sample [va, vd]. We remap that sample to a new sample [va, id, id, vd] and create weights [1, 0, 0, 1]. This sample and weights now correspond to the merged skeleton [a, b, c, d] and can be freely mixed with other remapped samples/weights.

Solution evaluation

We fixed all of the problems we had (or, more precisely, the ones we knew we had), so now we can take a step back and check the final implementation of our custom animator class:

// Once
Skeleton = MergeSkeletons(States); // 👈 (1)
foreach (var binding in Skeleton.Bindings) {
TransformHandles.Add( Animator.BindStreamTransform(transform.Find(binding.path)) ); // 👈 (2)
}

// Update
foreach (var state in States) {
var sample = Ozz.Sample(state);
if (state.Additive) {
sample = Ozz.Additive(state.Clip.FirstFrameInverse, sample); // 👈 (3)
}
sample = SampleRemap(state.Clip.Skeleton, Skeleton, sample) // 👈 (4)
samples.Add(sample);
}
var final = Ozz.Mix(samples, ...);

// IAnimationJob update // 👈 (5)
void ProcessAnimation(AnimationStream stream) {
for (int i = 0; i < Skeleton.Bindings.Length; i++) {
TransformHandles[i].SetLocalTRS(stream, final[i].position, ...); // 👈 (6)
}
}

For all the animations that are being animated, we need to merge their skeletons to a skeleton superset (1). Then, for all the bindings in that merged skeleton, we find and bind the transforms and properties (2) from the scene. All this is done only once at the start and at each time the animation skeleton set changes.

Each frame we sample all the animations. In case the animation should be applied additively, we convert the blend animation to additive at runtime (3). We also remap the samples to the samples for the merged skeleton (4).

We pass the final mixed sample to the animation job (5), where we apply the changes to transforms and properties (6).

Taking one more step back, we can catch a glimpse of the architecture:

Here, the OzzAnimator is the main developer interface. Users of the custom animation system interact only with it.

The OzzAnimator internally controls the UnityEngine.Animator instance, which in turn drives the playable graph. All of the animation state for the graph is provided from the OzzAnimator.

That playable graph contains most of the code shown earlier. It does all the sampling, blending and remapping in the PlayableBehaviour. The computed samples are then passed to the IAnimationJob which applies them on the bound transforms and properties. Without IAnimationJob, we wouldn’t have a way to apply property changes.

Ideally, we would move the sampling and blending to the animation job so it can run on worker threads instead of the main thread, but it was easier to implement it this way. We can always improve on it later.

Improvement measurement

Testing the app size again with the new custom animation system we get:

  • My Talking Angela: 110MB (16MB of animations)
  • My Talking Angela (Custom): 102MB (8MB of animations)

That is a 50% animation asset size decrease! With the saved space we can now get a higher install rate or we can add more content for the same install rate, whichever we choose.

We also made sure that the animations stayed of the same quality. You can try to spot a difference (hint: left one is Unity):

Advantages & disadvantages

By using our own custom animation system, the main advantage is that animation size decreases by half.

On top of that, Ozz also has other advantages:

  • Additive weights in range [-inf,inf] (Unity has a range of [0,1])
  • Per-joint weights in range [0,1] (Unity only has masking with 0 or 1)

But using a custom system also comes with its own disadvantages:

  • No animation state machine (ASM)
  • No animation clip events
  • Maintenance of the custom animation system
  • Probably more stuff…

With this in mind, all should be carefully weighed to establish whether it will benefit your project. The disadvantages are significant to overcome, and not everyone will benefit sufficiently from the advantages.

For us it was an easy decision, because we already had a custom ASM that also supported clip events. (Events should be easy to add on top of what is described here, but ASM is a different beast.) Also, we have enough manpower to maintain the animation system by ourselves, (though we’re always looking for more).

Unresolved problems

Even though we’re using the custom animation system in production (My Talking Angela), there are still some problems that we haven’t yet been able to resolve.

One is that Ozz uses 16-bit floats for animation data, leading to loss of accuracy. If joints on the skeleton need to be in sync with some other non-parent joint, it leads to tearing. Even without any animation data compression enabled.

The other one is that performance isn’t great when applying transforms inside the animation job. Unity adds a lot of overhead for checking if the operation is valid. Ideally, Unity would provide an API on transform stream that supported setting multiple transforms at once. But, for now at least, it’s fast enough for our games.

Conclusion

We’re really happy with the results of using a custom animation system. We reached our goal of reducing the app size while maintaining decent performance. The 50% animation asset size decrease reduces the My Talking Angela game size by about 9% overall, resulting in quite some space for additional content.

The Ozz animations are of the same quality as Unity’s, while having a lower storage footprint. The library is also really nice to work with. Without it, we’d have needed quite a bit more time to achieve a similar solution.

There’s still more work that can be done on the custom system. For instance, we can move all the processing to threads to make it faster. Maybe we can compress the animation data with zstd to gain even more space. We could also import the animations directly from .fbx. In short, there is a lot of stuff that we can still improve upon!

Even though our custom animation system is not open-source, I hope I outlined it sufficiently for you to implement your own or at least take something useful from this. And if you have any suggestions of what we could do better, please let me know in the comments. I’d love to hear it!

--

--