Instancing with three.js — Part 1

This is a three part series on a somewhat broad technical 3d topic. The first part is a theoretical overview of the technique, the second part edits an existing demo to apply instancing and the last part explores optimizations.

Intro

Imagine we have a 3d world with lots of trees, or lamp posts. When we render such a world we issue a lot of draw calls. Draw calls have overhead and are expensive. For the sake of interactive frame rates, we want to remove them where possible.

If we have something like:

```const myGeom = new THREE.BoxGeometry()
const myMaterial = new THREE.MeshBasicMaterial()
const myGroup = new THREE.Group()for ( let i = 0 ; i < 25 ; i ++ ) {
const myMesh = new THREE.Mesh(myGeom, myMaterial)
myMesh.frustumCulled = false
myMesh.position.set(random(),random(),random())
}```

And add `myGroup` to a scene and render it, without any optimization, we will cause 25 different draw calls to happen (in addition to anything else that may be in the scene, including a clear call).

`.frustumCulled = false` turns of an optimization that aims to reduce these draw calls. This intersects the bounding sphere with the camera’s frustrum. If it is entirely outside of it, no draw call is issued for that mesh. If this were true, we would potentially see less than 25 draw calls, depending on our space configuration and where the camera is in it.

This, with some trade offs, could be one draw call.

The “brute force” way

One approach we can take to optimize the draw calls is to merge these meshes (geometries) into one.

If we consider these meshes to be static (a lamp post doesn’t have to change it’s position or scale during the life of the app) we partially care what describes the “group”.

In the snippet above it’s a scene graph node `Group` that holds instances of `Mesh` nodes. Each node is part of that cluster, we can translate them individually, but we can also move them all together in sync by translating the parent.

```const geom = new THREE.BoxGeometry()
const material = new THREE.MeshBasicMaterial()const mergedGeometry = new THREE.BufferGeometry()for ( let i = 0 ; i < 25 ; i ++ ) {
const nodeGeometry = geom.clone()
nodeGeometry.translate(random(),random(),random())
mergedGeometry.merge(nodeGeometry)
}const myCluster = new THREE.Mesh( mergedGeometry, material)```

We merge all the individual lamp posts into one cluster. We still have access to a scene graph node `myCluster` and we can move the entire group, but for example, we lost the ability to easily adjust the spacing between them (there is no individual lamp node any more).

We can however render all the lamp posts in the world, or a tile, with one draw call.

This approach is a memory hog.

Since we kinda “unroll” this geometry, the GPU now has to store much more data. In the first example, it only stores one instance of geometry and references it with each draw call. In the second, it’s still one geometry, but it’s 25 times larger since that’s how many times we duplicated it.

The merge operation itself may be slow and cause a lot of GC activity.

It is complicated to update individual instances within the cluster.

On the flip side, we could remove a matrix multiplication from the shader.

The clever way

GPUs and WebGL are all about managing memory and issuing commands. We have a feature called “instancing” that would allow us to perform the optimization we just did with merging, but in a much more efficient way.

It is possible to use a much smaller set of data to describe what we want and render it the same. First let’s refresh a bit on the scene graph and what it does in GLSL.

When we make some geometry:

`const geometry = new THREE.PlaneGeometry()`

We can always expect three to produce some GLSL as such:

`attribute vec3 position;`

Without normals for lighting and uvs for mapping, this is the variable that the shader is going to access and get the position of a vertex in model space. This is the value from `lamp_post.obj` or some corner of a plane.

The GLSL that `Material` produces is of no interest yet, so let’s move onto the scene graph:

`const mesh = new THREE.Mesh(geometry)`

We usually manipulate `mesh.rotation, mesh.position, mesh.scale` , but these all get baked into a single 4x4 matrix on their way to GLSL, yielding:

`uniform mat4 modelMatrix;`

Whenever we change the position for example, the engine will recompute an appropriate `THREE.Matrix4` and the shader will have a fresh`modelMatrix` variable.

While not directly related to instancing, let’s note how the camera maps:

`const camera = new THREE.PerspectiveCamera`

GLSL:

```uniform mat4 projectionMatrix;
uniform mat4 viewMatrix;
uniform mat4 modelViewMatrix; //camera + mesh/line/point node```

`modelViewMatrix` here actually belongs to both `camera` and `mesh` more on that in a bit.

Let’s transform the mesh with a very simple vertex shader. `THREE.ShaderMaterial` actually injects all these uniforms for us so we don’t have to:

```void main(){
gl_Position =
projectionMatrix * viewMatrix * modelMatrix * vec4(position,1.);
}```

Going right to left:

• we cast the `attribute` from `BufferGeometry` to a `vec4` since it comes in as `vec3` .
• We apply the world transformation derived from position,scale and rotation.
• We project this into a camera

As mentioned, you’d have to use `THREE.RawShaderMaterial` in order to declare all these uniforms yourself. `THREE.ShaderMaterial` receives them from wrappers and abstractions.

In the first example, where the scene graph holds a parent with 25 children, the engine will compute 25 different `modelMatrix` values. If you move the parent, three will do something along the lines of:

`parentMatrix.multiply(childMatrix)`

because the GLSL shader needs it:

```vec4 worldPosition =
modelMatrix * //moves the instance into world space (parent+child)
vec4(position,1.); //model space```

When we merge, we remove the need to do the 25 matrix updates because we remove the child-parent relationship.

```cluster.position.set(1,1,1)
cluster.rotation.set(Math.PI,0,0)
cluster.scale.set(2,1,2)```

Still affects the `modelMatrix` , but three has to only ever compute one.

In the first example the matrix from `Group` is never actually directly encountered in GLSL since there is no draw call issued for such a node. It is present in all draw calls though since it needs to be computed on the CPU by multiplying it with the matrices that are actually used in the shader (mesh `modelMatrix)` .

In the second example, we actually conflate this with `attribute vec3 position;` :

```const geom = new THREE.BufferGeometry()for ( let i = 0 ; i < 25 ; i ++ ) {
const nodeGeometry = geometry.clone()
nodeGeometry.applyMatrix( myMatrix[i] )
geom.merge(nodeGeometry)
}```

We do a one time cpu operation, where we apply the matrix directly on the vertex:

```vec4 worldSpace =
modelMatrix * //moves the entire cluster (parent)
vec4( position, 1.); //not really model space any more, since it has the transformation "baked in" from outside (child)```

`attribute vec3 position;` no longer maps to `lamp_post.obj` . We’ve burnt in part of the scene graph, and lost the uniqueness of model space.

Instancing

Let’s take a step back and consider some of the elements we have after the lengthy overview so far:

• some 3d world ( `THREE.Scene` )
• some spatial entity, like a neighborhood, a village or a tile ( `THREE.Group` )
• some asset, like a tree or a lamp post ( `THREE.BufferGeometry` )
• some intent on how the asset fits the world, ie. 25 lamp scattered in the world in some pattern ( `THREE.Mesh` )

The basic idea is :

```const asset = OBJLoader.load('lamp_post.obj') //load a small asset once//scatter asset
const tile = new THREE.Mesh(new THREE.PlaneGeometry)myPositions.forEach( pos=>{
const mesh = new THREE.Mesh(asset, myMaterial)
mesh.position.copy(pos)
})```

When we call `tile.position.set()` we move all the instances of the asset with it. We want to retain that convenience.

When we call `tile.children.position.set()` we can move a single asset relative to the others (we lose this with merging) but cause draw calls and expensive cpu side matrix computation with a child-parent relationship of a scene graph.

We want the convenience, but not the side effects. We can address all of these, with a gotcha.

Low level

The only instancing referenced in the docs are classes `InstancedBufferAttribute` and `InstancedBufferGeometry` .

There are a few examples but they are all low level, including this one.

You’ll notice that the convenience is gone:

`myLampPost.clone().position.copy(myPosition)`

And in it’s place it’s something like this:

```var offsets = new Float32Array( INSTANCES * 3 ); // xyz
var colors = new Float32Array( INSTANCES * 3 ); // rgb
var scales = new Float32Array( INSTANCES * 1 ); // s
for ( var i = 0, l = INSTANCES; i < l; i ++ ) {
var index = 3 * i;
// per-instance position offset
offsets[ index ] = positions[ i ].x;
offsets[ index + 1 ] = positions[ i ].y;
offsets[ index + 2 ] = positions[ i ].z;
// per-instance color tint - optional
colors[ index ] = 1;
colors[ index + 1 ] = 1;
colors[ index + 2 ] = 1;
// per-instance scale variation
scales[ i ] = 1 + 0.5 * Math.sin( 32 * Math.PI * i / INSTANCES );
}
geometry.addAttribute( 'instanceOffset', new THREE.InstancedBufferAttribute( offsets, 3 ) );
geometry.addAttribute( 'instanceColor', new THREE.InstancedBufferAttribute( colors, 3 ) );
geometry.addAttribute( 'instanceScale', new THREE.InstancedBufferAttribute( scales, 1 ) );```

Looks pretty gnarly, and thats only a small portion of the code needed to get (partial) instancing running on what’s trivially done through the scene graph.

Still this snippet is useful to tell what’s going on. But let’s also include a portion of the shader.

The relevant part from the example is:

```#ifdef INSTANCED
attribute vec3 instanceOffset;
attribute float instanceScale;
#endif```

This is a little bit different than the format we used in the previous two examples, since it does not use a matrix, but individual components (and it’s missing the rotation).

Unfortunately to set up a `mat4` with instancing is a bit involved, but let’s pretend for a moment that we can:

```attribute mat4 instanceMatrix; //instance attribute attribute vec3 position; //regular attributevoid main(){
gl_Position =
projectionMatrix * viewMatrix * //from THREE.Camera
modelMatrix * //from THREE.Mesh
instanceMatrix * //we add this to the chain,
vec4(position,1.) //from THREE.BufferGeometry
;
}```

We add another transformation step to the shader. Unlike projectionMatrix, viewMatrix and modelMatrix `instanceMatrix` is not a uniform, but an attribute. It’s a special kind of instanced attribute, which is part of the memory management magic that WebGL is supposed to do.

We can’t declare a `mat4` attribute, so we have to compose it out of several `vec4` attributes. Although this statement may not be 100% correct, it’s a straight forward way of going about the problem.

Compared to the previous two examples, this shader will still run 25 times the number of vertices per instance.

It will do an extra matrix multiplication, but in the case of 25 unique nodes, it will be done on the GPU not the CPU. The way GPUs work, they might be idle waiting for the draw call to be issued, so the gain can be made here by keeping them busier at the same overhead cost (draw call).

Memory wise, `attribute vec3 position;` still references the vertices of a `BufferGeometry` or `lamp_post.obj` only once, the same way it does if we were to reuse `BufferGeometry` with 25 unique nodes. This saves a lot of memory over merging.

The WebGL API prevents us from having `uniform mat4 instanceMatrix;` the same way we use other transformation matrices. It has to reference this as an attribute, hence:

`geometry.addAttribute( 'instanceOffset', new THREE.InstancedBufferAttribute( offsets, 3 ) );`

Three.js under the hood set’s up the other uniforms before each draw call. Since we compress many draw calls into one, we need to have all of this information available for it. This is what the `InstancedBufferAttribute` does. It’s an array of numbers formatted in such a way that they correspond to the data you would otherwise draw with a unique draw call.

In this snippet, we don’t use a matrix, but a simple 3d vector, to move instances individually. For our 25 lamp posts, this would mean 75 numbers to represent 25 different 3d vectors.

In a simple shader, for each draw call we would have:

`uniform vec3 offset;`

Since we compress these into one draw call:

`attribute vec3 offset; //this is actually 25 different values that will be referenced`

The draw call then draws many vertices the same, over and over for each instance, with each instance, it accesses a different value:

```void main(){
vec3 myPosition = position + offset; //offset will change value 25 times during the draw call
}```

This attribute can be much larger than the geometry, for example, drawing a simple plane a million times would require a much larger instance attribute than the geometry ones. Vice versa, rendering detailed geometry a few times would make it smaller.

With this kind of a setup, we retain some control over per instance positioning, being limited by the same limitations as we have when updating geometry.

Ie. modifying the offsets of the instances is the same as modifying the vertices of a mesh (it’s operating over an attribute, and the attribute needs to be updated), this can still be very performant. But the convenience is gone:

```for ( var i = 0, l = INSTANCES; i < l; i ++ ) {    var index = 3 * i;    offsets[ index ] = positions[ i ].x;
offsets[ index + 1 ] = positions[ i ].y;
offsets[ index + 2 ] = positions[ i ].z;```

Just like any other attribute, three.js leaves us to fill up an array with appropriate values.

Problems

As mentioned, this is all pretty low level, and it’s all three.js offers, but with a good reason. A game engine would guess how assets are being used, and try to optimize this under the hood. Three.js can be used to make a game engine, but it could be used for something else where the need for instancing would be vastly different.

So in order to use instancing with three.js we need to know GLSL and how WebGL data structures work. And that’s just for generic instancing, in order to make it work with three.js’s entire system, it becomes quite a bit more involved.

The three.js example only tackles the lambert material, through one approach for material extensions (copying the shader). There would be more code involved for a more complex material such as `MeshStandardMaterial` .

The user has to format the attribute properly, which involves converting `BufferGeometry` to `InstancedBufferGeometry`.

The responsibility of the scene graph get’s a bit conflated with `Geometry` which is not part of it. The bare bones of the scene graph is a node that we can set `position` on and which holds a `Matrix4` . Since our uniform turned into an attribute, we have to set `position` as if it were a vertex of a mesh (on `Geometry` ).

`InstancedBufferGeometry` ends up doing the job of both `Geometry` and `Object3D` ( `Group` ).

In the next part we are going to actually write some code and apply instancing to a demo.