How We Implemented the Puppet Tool in Animatron: a Step By Step Guide

One of the upcoming tools in Animatron (our awesome online animation maker) is what we call puppet. It allows to go from this:

into this:

which is very cool! 
(Note: reduced the quality of this animation causing color artifacts not present in the original)

In this article I’ll try to explain how this has been done in a fairly detailed fashion.

So, at first you have an image you’d like to animate in some funny way, like the sesame puppet we use throughout the article.

We take your image and use some basic image processing to remove the background. This processing is very simple at the moment, it just tries to estimate background color from the image corners and then erase the same color within some tolerance. In addition, it tries to apply some feather to smoothen the edges. Both tolerance and feather are set to some sensible default values that seem to work well with cartoon images we tried. Here is how it looks like for our image:

Next, we apply some simple computer vision to the image — we first try to find out what is the component with the largest area in the image (which we assume is the character you want to animate) and then extract its contour. We end up with the following image:

The blue line here is the contour of the image. We use this contour to prepare a triangulation of the character. There are many possible choices for the type of triangulation, such as regular triangulation, triangulation using equilaterals etc. We decided to use regular triangulation due to the type of “morphing” we ended up using. Internally, we first generate a regular mesh of points covering the bounding box of the character, then test which points are “inside” the contour using the usual even-odd rule. This might not work 100% for border triangles, so in addition we test if triangles created from each 3 neighboring points cover at least one point on a contour. This gets us perfect results.

So, the picture with regular triangulation now looks like this:

Now for those of you experienced with computer graphics you probably understand why this is needed. We basically render individual triangles covering the character using WebGL and change shape of these triangles according to some method making character bending look realistic. This way we provide an illusion that the character is changing its shape naturally, while we just render triangles that adjust their shape and the texture mapping does the trick for us.

In the editor itself, we allow you to place pins at portions of your character corresponding to vertices of triangulation grid. Then you can navigate to some time and move these pins as you like. The shape will update as you do that. Here you can see 5 pins, three moving hands and top of head, two keeping nose and bottom edge at the same place; blue dots show origin of movement and green dots destination while the yellow pins show the current position:

One of the methods you can use for moving triangles (and which we actually use here) is physics simulation with mass-spring networks. This simulation method is extremely popular among game developers to simulate cloth, hair, particle systems, collision deformation etc. and even some structural engineers use it, though its origins are in biological systems in simulation of proteins. It’s a very fast and stable method these days.

To explain how it works, we first talk real physics! Let’s look at a spring and its properties. Each spring has what we call rest state, which is the state the spring is naturally in and in which no internal forces exist. Each spring also has a stiffness property, let’s call it k, which is some ratio how much force do we need to actuate spring movement. This can be summarized by Hooke’s Law:

This says that the force needed to expand/shrink a spring is proportional to the expanding/shrinking distance x from its resting state length. As any simple physics formula, this one also has a catch — it works only for very small values of x. But like most high-school physicists, we will use it anyway!

Spring forces act outwards if we try to squeeze the spring and inwards if we try to stretch it:

What we do is that we link all neighboring points in our triangulation grid with these springs. We actually link them in this fashion:

Each side of a rectangle is a spring, and there are two additional springs, one for each diagonal, to keep it stable. Now you probably understand why we chose to use regular triangulation.

Our picture then has an undergoing physical layer that responds to the movement of actuating pins in the same way a set of springs would. The actuating pins, which you inserted in the editor, are causing the movement and all the other points to adjust according to spring network behavior. All grid points also have the same mass. This is what makes the image deformation seem somewhat realistic. Of course, this model is simplistic, it only has one 2D spring layer whereas with normal objects you’d expect to have some 3D structure, however for many cases this is good enough for the purpose of animation.

Alongside stiffness we also use damping acting against the spring movements proportional to spring’s internal velocity. This allows us to control the amount of oscillations — low damping creates large oscillations, high damping removes them.

Material-wise, if we set both stiffness and damping to low values, we get a behavior similar to liquids. If we set both to medium values, we get fabrics-like behavior. If we set both stiffness and damping to high values, we get a rigid material behavior (however here the stability of computations might decrease along areas where material might break). By default, we set it to above average stiffness and below average damping to behave more like a rubber layer.

The algorithm and math to compute physics is as follows:

First, the spring force exerted on end points pi and pj could be written as:

where k is the spring stiffness, lij is the current distance between pi and pj, rij is the rest length of spring, or the original distance between pi and pj and dij is the unit direction of the vector between pi and pj. You need to make sure the direction dij corresponds to the expected behavior at each endpoint, i.e. the force needs to go inward the spring if the current length is larger than rest length and outwards if it’s the other way round.

As for damping, with the need to slow down the current movement, the force exerted on a point pi is simply proportional to point’s velocity vi:

The overall force on a point will be then:

where N(i) are all neighbors of pi connected to it via springs. We can afford to ignore mass as we assume all points have the same mass, so we can just pretend it could be also unit mass. We can then assume our computed forces correspond to individual point accelerations.

For each time step, the algorithm starts with first moving actuating pins to their respective places at a given time, then computing all forces in all springs and at last adding damping force; each point in the end will have a single vector which will be the sum of all forces exerted on it from all the springs it is connected to.

Subsequently, in the second phase we need to compute the new positions for each point. Here we use Verlet integration with constant step dt, a very simple yet very stable method for solving dynamics equations. This method doesn’t need to keep velocities, it rather computes them from the previous positions. The final positions will be:

where pi^n is the position of pi in the simulation step n and dt is the integration step that should be small enough to provide stable solutions and large enough to make computation fast. This is the trick that makes cloth, hair and deformations fast and real-time in many current games while allowing you to adjust it to what your computing device is capable of.

If you want to learn more about physics beyond Hooke’s law and mass-spring systems, with real use in structural engineering or even rocket science, I recommend this MIT course.

As described, we compute coordinates of points using this method in the editor. It seems however like something that could be slow when running in a browser, right? Mind you, browsers have a very simple OpenGL functionality in a form of WebGL 1.0, no compute shaders, no transform feedback that could be used and JavaScript, despite recent advancements, is a far cry performance-wise from what you can get by using C++ in games or even on mobile devices.

How to make this fast then?

Well, we use a WebGL 1.0 shader trick. Instead of computing physics for every single frame during playback, we compute physics simulation in the editor only and store coordinates at certain key frames in the editor (such as whenever a new movement is initiated/ends and in some frames in between if the time between movements is too long) and player uses only these stored positions instead of computing anything. We then use a special vertex shader in WebGL that blends the coordinates to present an approximation of a shape at a given time. Our shader looks like this:

#ifdef GL_ES
precision highp float;
attribute vec2 initialShape;
attribute vec2 finalShape;
attribute vec2 textureCoordinates;
varying vec2 finalTextureCoordinates;
uniform float alpha;
uniform mat4 projectionMatrix;
	void main()\n" +
	    vec4 initialPosition = vec4(initialShape.x, initialShape.y, 0, 1);
vec4 finalPosition = vec4(finalShape.x, finalShape.y, 0, 1);
vec4 position = initialPosition * (1.0 - alpha) + alpha *
finalTextureCoordinates = textureCoordinates;
gl_Position = projectionMatrix * position;

Here the initial shape is given by initialShape VBO, final shape in finalShape VBO and the blending amount in uniform alpha. Projection matrix is just for proper scaling/positioning of a 2D image within WebGL viewbox.

An example — imagine you have character coordinates recorded at times 0s, 1s, 3s, 4s. Now you want to view shape corresponding to the state at 2.5s. You look at coordinate times and pick the pre-computed coordinates at time 1s as your initial shape, the coordinates at time 3s as your final shape. Next you compute alpha for time 2.5s, which is (2.5s — 1s) / (3s — 1s) = 0.75. Then you pass these into the shader and shader renders an approximated shape at time 2.5s. On the picture above, there is a depiction of this scenario — imagine the blue polygon is the state of the object at the time 1s, the black is the state at 3s. The red one is the interpolated one at 2.5s, having all vertices ¾ on the line between the blue and black ones.

So that’s it about puppet implementation I wanted to share with you!

Alternative implementations

You might want to consider using as-rigid-as-possible shape manipulation algorithm for moving triangles based on quadratic error metrics to minimize distortions of triangles both in rotation and scale. Benefits are that you can explicitly compute positions at any time without any kind of physics simulation and that the deformations seem to be more predictable, disadvantage is the need to handle large sparse matrices and their inverses in a browser, as well as sometimes looking less realistic. You can learn more about it in this interesting course from University of Tokyo.


- Animatron is written in Java using GWT; getting WebGL renderer working nicely was a major challenge fusing together WebGL, GWT’s native methods and JavaScript with very limited debugging possibility

- server-side rendering in our case is running on cloud machines without a GPU, hence we can’t use our WebGL renderer. Instead, we figured out a math trick allowing us to render the same quality with Java2D using TexturePaint. Yes, it’s possible, the animation shown at the beginning of this article is rendered this way!