Into Vertex Shaders Part 3: Memory Management

Szenia Zadvornykh
9 min readJun 28, 2017

This is the third in a series of articles about advanced WebGL animation using Three.js and Three.bas, my extension for complex and highly performant animation systems.

In the previous post we emulated parts of the 3D graphics pipeline to better understand how some key pieces fit together. In this post we will build on this, and take a closer look at what happens on the hardware level when you tell WebGL to do stuff.

My Three.js extension helps create animations where many objects can be moved (transformed) simultaneously. The key to my approach is moving some logic that is typically done in JavaScript (which is executed on the CPU) to GLSL (which is executed on the GPU). To understand the reasons for this, we will first create an animation system using a more traditional approach.

Many Cubes

The animation in question will be quite straight forward. We will create a number of cubes, and transition them from a start position to an end position over time.

First we define a Geometry and a Material. Then we use these to create a number of Meshes. We also define some custom properties to prepare for the animation logic.

var geometry = new THREE.BoxGeometry(...);
var material = new THREE.MeshPhongMaterial();
for (var x = 0; x < gridLength; x++) {
for (var y = 0; y < gridLength; y++) {
for (var z = 0; z < gridLength; z++) {
var mesh = new THREE.Mesh(geometry, material);
// define animation properties
mesh.startPosition = new THREE.Vector3(...);
mesh.endPosition = new THREE.Vector3(...);
mesh.duration = 1;
mesh.startTime = 0;
scene.add(mesh);
}
}
}

There is a handful of ways you could set up the animation logic, but we will try to do it in a way that minimizes overhead (to give the CPU a fighting chance).

The crux of our animation logic is linear interpolation in the form of

currentValue = startValue + (endValue - startValue) * progress

This is a generalized formula to calculate a value between a start and an end value based on a progress between 0.0 and 1.0. It has many applications and its use has become quite ubiquitous.

Each frame, we use linear interpolation to determine the current position of each cube. The progress is calculated based on startTime and duration of each cube, and a global time value. This value controls the state of the animation.

function update() {
time += 1/60;

for (var i = 0; i < cubes.length; i++) {
var cube = cubes[i];
var st = cube.startTime;
var d = cube.duration;
var sp = cube.startPosition;
var ep = cube.endPosition;
// progress in range of [0.0 to 1.0]
var p = THREE.Math.clamp(time - st, 0, d) / d;

// lerpVectors performs linear interpolation
cube.position.lerpVectors(sp, ep, p);
}
}

Below you can see the running animation. Please take a look at the JavaScript to see the complete code. The buttons can be used to control the number of cubes that are created.

Depending on your device, the animation should run fine with a few hundred cubes. It may even be able to handle a thousand. But if you add many more, you will see a sharp decrease in performance.

Have we hit the limits of what WebGL can do?

Too many cubes?

The short answer is no.

For the long answer, we need to examine what is actually happening on the hardware level.

In the previous post we saw how matrices are used to store the transformation for a mesh in a scene. A transformation matrix consists of 16 numbers. Each mesh in our scene has its own transformation matrix. When the position of a mesh is changed in JavaScript (on the CPU), its matrix needs to be updated and sent to the GPU.

The CPU and the GPU are two separate entities connected by a strict communication protocol. The GPU is optimized to perform computation in parallel, meaning it can do a lot of math at the same time. However, when you send new data to the GPU, these computations are interrupted. The new data needs to be processed before the GPU can resume. The more data you send to the GPU, the bigger the interruption becomes.

When our animation system has too many cubes (each representing 16 numbers), the GPU has to spend more time processing new data than it gets to spend on doing what it loves to do: math.

The key to getting the most out of the GPU is minimizing the quantity of new data that it has to process each frame. While we cannot significantly reduce the amount of data we need to calculate the animation state, we can change how the data is handled between the CPU and the GPU.

More Cubes!

Currently, the bottleneck is the JavaScript position calculation that forces the meshes’ transformation matrix to update each frame. To resolve this bottleneck, we can move the animation update logic from JavaScript to GLSL. Since the CPU and the GPU are separate entities in terms of memory, we also need to store the required data on the GPU.

Custom Attributes

When we created the meshes for the animation, we defined four custom properties: startPostion, endPosition, startTime, and duration. This is the data required for the animation state calculation. Since this data is different for each mesh, the best way to store it on the GPU is using custom attributes in the cube geometry.

As mentioned in the previous post, a 3D geometry consists of a list of vertices. You can think of a vertex as a JavaScript object with a number of properties, called attributes. Position is one such attribute, but like a JavaScript object, a vertex can have any number of attributes.

Thinking in terms of JavaScript, adding additional properties would be simple:

var vertex = {
position: new THREE.Vector3(...),
startPosition: new THREE.Vector3(...),
endPosition: new THREE.Vector3(...),
startTime: 0.0,
duration: 1.0

};

Unfortunately, adding vertex attributes is a little more involved. While the object representation of a vertex is easy to reason about, objects are an inefficient way of storing (lots of) data in memory. Instead, vertex attributes are flattened (destructured) and stored in arrays called buffers.

Each vertex attribute has its own buffer. Buffers have a fixed size, which is determined upon creation. The size of the buffer depends on two things: the number of vertices and the size of the attribute. For instance, position is defined by 3D vectors, so the size of each attribute is 3 (x, y, z). A single cube has 8 vertices, so the position buffer length would be 8 × 3 = 24. The buffer itself would look something like this:

[v0.x, v0.y, v0.z, v1.x, v1.y, v1.z, v2.x, v2.y, v2.z, ...etc]

As for our custom properties, the attribute size for startPosition and endPositions is 3. The attribute size for startTime and duration is 1. Later on, Three.bas will help us create the buffers of the correct size.

Buffer Geometry

Before we continue, there is one more important thing we need to consider. If you look at how we created the meshes, you may notice that each mesh uses the same geometry and material, in stead of creating new instances. This way Three.js can optimize its internal render logic, and draw all the cubes at once. Each mesh is essentially a copy of the geometry rendered with a different transformation matrix. It is much faster to render the same geometry a thousand times than it is to render a thousand different geometries once.

We will be adding additional vertex attributes to the geometry, and these values will be different for each mesh. Because of this, Three.js can no longer perform its optimizations, so we will have to do it ourselves. Fortunately Three.js does provide a class that will help us: THREE.BufferGeometry. Buffer Geometry makes it easier to work with vertex attribute buffers directly (in stead of the object representations found in THREE.Geometry).

In Three.Bas, I created a wrapper around this class that helps create buffer geometries where base geometries are repeated a given number of times. I refer to this base geometry as a prefab. This wrapper also helps create and populate buffers as discussed above.

// create the geometry that will be repeated in the buffer geometry
prefab = new THREE.BoxGeometry(cubeSize, cubeSize, cubeSize);
// create the buffer geometry where the prefabs are repeated
geometry = new THREE.BAS.PrefabBufferGeometry(prefab, cubeCount);
// create buffers with the appropriate item size
var startPositionBuffer =
geometry.createAttribute('startPosition', 3);
var endPositionBuffer =
geometry.createAttribute('endPosition', 3);
var durationBuffer =
geometry.createAttribute('duration', 1);
var startTimeBuffer =
geometry.createAttribute('startTime', 1);

Creating an Animation Shader

Now that the data required to calculate the state of our animation is available on the GPU, we need to move the update logic there too. We will do this by “extending” a built-in Three.js material with a few lines of GLSL.

Three.js has a number of built-in materials, and the corresponding shaders are represented as strings. Because some logic is shared between these materials, the shader code is broken down into chunks, which are concatenated together into the final shader code.

Three.bas builds on this principle by injecting custom shader chunks in key places inside the original Three.js shader code. This allows you to define arbitrary (animation) logic in GLSL, while still being able to use all of the built-in features of Three.js, like lighting. Pretty cool.

The GLSL equivalent of our JavaScript update logic is as follows:

// define attributes and uniformsuniform float time;
attribute vec3 startPosition;
attribute vec3 endPosition;
attribute float startTime;
attribute float duration;
// ...then inside the main functionfloat progress = clamp(time - startTime, 0, duration) / duration;
position += mix(startPosition, endPosition, progress);

The attribute declarations must have the same name and type as the buffers we created. We also use two built-in GLSL methods: clamp and mix. clamp will ensure a value is between a upper and a lower bound. In this case, it will make sure progress is always between 0.0 and duration. Diving this by duration will give us a value between 0.0 and 1.0. mix performs linear interpolation as described earlier this post.

Below we use Three.bas to inject this GLSL code into the built-in THREE.MeshPhongMaterial. We also define time as a uniform to be used in the vertex shader. When put together, the custom buffer geometry and shader material can be used to create a single mesh, which is added to the scene.

material = new THREE.BAS.PhongAnimationMaterial({
uniforms: {
time: {value: 0}
},
vertexParameters: [
'uniform float time;',

'attribute vec3 startPosition;',
'attribute vec3 endPosition;',
'attribute float startTime;',
'attribute float duration;',
],
vertexPosition: [
'float p = clamp(time - startTime, 0.0, duration) / duration;',
'transformed += mix(startPosition, endPosition, p);'
]
});
cubes = new THREE.Mesh(geometry, material);
scene.add(mesh);

Like in the previous update method, time is used to control the state of the animation. We still have to set its value in JavaScript however, which makes our update function look like this:

function update() {
cubes.material.uniforms.time.value += 1/60;
}

Analyzing the Result

Now let’s bask in the glory of the new version of our animation. Check out the full source code for some more details of how everything fits together.

Depending on your hardware, you should be able to go up as high as a million cubes, but in any case you should be able to see between 100 and 1000 times more cubes move at 60 fps.

Wow.

This approach is so much faster because we went down from 16 × number of cubes worth of numbers to be updated and sent to the GPU each frame down to 1. One number, regardless of how many cubes are on screen.

The update logic itself is now executed once for each vertex, in stead of once per mesh. That doesn’t feel great, but the impact of the duplicate computations is negligible. The GPU is just that good at math.

The boost in performance does come at a cost however.

As you may have noticed, there is a significant delay when you create a large number of cubes. This should be no surprise, as we create and store significantly more data when the cubes are created. We are basically opting to pay our CPU cycle dues up front, in stead of frame by frame installments.

This approach is also less flexible. Since we are operating close to the hardware, there are some memory considerations we need to work around.

When the vertex shader is executed, it only has access to the attributes of that particular vertex. This makes it impossible to implement animation systems where positions depend on each other, like flocking.

Shaders are stateless. They cannot store values between executions. They also cannot modify the values of attributes passed to them. This means that “dynamic” logic where values change incrementally based on various inputs are difficult to implement. This will become much easier in WebGL 2. While it’s still far away, I’m very much looking forward to playing around with transform feedback.

There is an alternate animation approach that works around these limitations using textures in the vertex shader. This is a powerful technique, but it has its own drawbacks. It’s a different beast all together.

This concludes the overview of the structure and reasoning behind Three.bas. The animation we created as proof may not have been particularly spectacular, but in the next post we will kick it up a notch by throwing rotation, scale, and more interesting interpolations into the mix.

--

--

Szenia Zadvornykh

Creative coder with an interest in art and game design. Tech Lead @dpdk NYC.