Into Vertex Shaders part 2: Emulating the 3D Graphics Pipeline

10 min readJun 18, 2017

This is the second in a series of articles about advanced WebGL animation using Three.js and Three.bas, my extension for complex and highly performant animation systems.

In this post we will emulate part of the 3D graphics pipeline using plain old JavaScript and some of the utility of Three.js. There will be many omissions and simplifications, but this exercise should help you understand how some of the key elements fit together, without having to deal with everything at once.

If you want to skip ahead, the code we will be working toward can be found here. The whole thing is about 200 lines (excluding comments), so it should be pretty easy to step through even if you don’t read the rest of this post.

Our goal will be to draw a rectangle (or a Plane in 3D terms). We will defer the actual drawing to the canvas 2D API, but the steps to do so will be based on the WebGL graphics pipeline.

Like in Three.js, we will start by creating a scene, a camera, and a renderer. The scene will contain everything we want to render. The camera will determine our view of the scene. We will use THREE.PerspectiveCamera because it encapsulates all of the properties we need. The renderer will contain the bulk of the logic to make things appear on screen.

var scene = {
  children: []
};
var camera = new THREE.PerspectiveCamera();
var renderer = new Renderer();

Before we dive into the Renderer, let’s look at the model for our Plane, which will be represented by a Mesh with a Geometry and a Material.

Defining Geometry

3D models are represented by a list of “points” in 3D space called vertices. It is very important to realize that position is a property of a vertex, and not the vertex itself. Think of a vertex as a plain JavaScript object with a property called position.

Properties of vertices are called attributes. Alongside position, a typical vertex will have an attribute called a normal (a 3D point perpendicular to the surface of the model), used for lighting calculations, and a UV coordinate, used for texture mapping. Like JavaScript objects, vertices can have any number of attributes. This is limited only by the quality of your GPU, though you are much more likely to run into other issues before the number of attributes becomes a problem.

var vertex = {
  position: {x: 0, y: 0, z: 0},
  normal: {x: 0, y: 0, z: 1},
  uv: {u: 0, v: 0}
  // etc...
};

To keep things simple, our vertices will only have a position attribute. Our plane will have 4 vertices, centered around its own origin at {x: 0, y: 0, z: 0}. Let’s make it 200 units wide and 100 units high. This gives us the following:

var mesh = {
  geometry: {
    vertices: [
      // top left corner
      { position: {x: -100, y:  50, z: 0} },
      // top right corner      
      { position: {x:  100, y:  50, z: 0} },
      // bottom right corner
      { position: {x:  100, y: -50, z: 0} },
      // bottom left corner
      { position: {x: -100, y: -50, z: 0} }    
    ]
  }
};

Meshes are rendered in triangles called faces. Our plane will have two faces, and we will need to tell our renderer how to construct those faces from the vertices. This might seem somewhat arbitrary for a simple plane, but remember that all of this also applies to more complex geometries.

We will describe the faces using an array called indicies. Each number in this array represents the index of a vertex in the vertices array. Since we have 2 faces, the indices array will have 6 numbers (3 per face).

var mesh = {
  geometry: {
    vertices: [...],
    indices: [
      // first face
      0, 1, 3,
      // second face
      1, 2, 3
    ],
  }
};

Note that our faces are indexed clock-wise. While we will not take this into account for now, the winding of faces is very important in WebGL. Consistent winding enables WebGL to quickly determine whether a face is facing the camera or not. Faces that do not face the camera (the sides of an object you don’t see) are generally not rendered, boosting performance.

Defining Material

Geometry determines the shape of a mesh. The material determines its appearance. There are a number of different materials built in Three.js, all with their own set of properties. These properties are applied to all vertices. In our emulation, the material will only have one property: color.

In addition, the material will reference the shaders we will be using. In WebGL shaders are written in a language called GLSL, and sent to the GPU as strings. Our JavaScript emulation will use functions to approximate this process.

var mesh = {
  geometry: {...},
  material: {
    color: '#ff0000',
    vertexShader: basicVertexShader,
    fragmentShader: basicFragmentShader
  }
};

Defining Transformation

For the most part, geometries in WebGL should be considered static; once a model is loaded or generated, changing geometry data every frame will be very slow (and we want things to go fast). To facilitate animation, we will add 3 properties to our mesh that represent its transformation relative to the scene.

var mesh = {
  geometry: {...},
  material: {...},
  position: new THREE.Vector3(0, 0, 0),
  rotation: new THREE.Euler(0, 0, 0),
  scale: new THREE.Vector3(1, 1, 1)
};

We use Vector3 for position and scale, and an Euler for rotation. The Euler defines rotation around the 3 axes using radians (this is the same way you define rotation in CSS). These Three.js classes will make it easier to work with the math API later on.

The Renderer

As mentioned earlier, the renderer is responsible for drawing our mesh on screen. Since the mesh is defined in 3D coordinates, and the screen is flat, an important part of this responsibility is converting 3D coordinates to 2D. This process is called projection.

In order to correctly project a vertex on screen, we will need to take a number of things into account:

The position of the vertex relative to the model.
The transformation (rotation, scale, translation) of the model relative to the scene.
The transformation of the camera relative to the scene.
Projection properties of the camera, like field of view and aspect ratio.

This might seem like a lot, but luckily this process can be greatly streamlined using matrix math.

The section ahead deals with matrices and vectors (linear algebra). If you are not familiar with this topic, check out this article that covers the basics of matrices and vectors in Three.js.

Our render has one main method: render.

renderer.render(scene, camera);

This will render the scene as viewed through the camera. Let’s look at this method line by line.

Step 1: Clear the screen

this.ctx.fillStyle = this.clearColor;
this.ctx.fillRect(-1, -1, 2, 2);

This will clear the previous frame by filling the screen with a solid clear color. Since we are working in Normalized Device Coordinates (see previous post), we can clear the entire screen using fillRect(-1, -1, 2, 2).

Step 2: Update the camera

camera.updateProjectionMatrix();
camera.updateMatrixWorld();
camera.matrixWorldInverse.getInverse(camera.matrixWorld);

The projection properties of the camera, including field of view and aspect ratio, are represented internally in camera.projectionMatrix. This matrix needs to be updated whenever any of those values change between draw calls. The same applies to camera.matrixWorld, which represents the transformation of the camera relative to the scene. Because we are looking at the scene through the camera, we also need the inverse of camera.matrixWorld.

Step 3: Render each child

scene.children.forEach(function(child) {
  this.renderChild(child, camera);
}.bind(this));

Once the camera matrices are updated, we can move on to rendering the children of the scene. Since the steps for each child are the same, we can move this logic into a separate method renderer.renderChild(), passing the child in question and the camera along.

Step 3.1: Create a world matrix

var matrixWorld = new THREE.Matrix4();
var quaternion = new THREE.Quaternion()
  .setFromEuler(child.rotation);matrixWorld.compose(child.position, quaternion, child.scale);

Inside renderer.renderChild() we first need to create a matrix representation of child’s transformation. This is similar to what we did for the camera. Matrix4.compose will help us do just that, but first we need to turn the Euler we used for rotation into a THREE.Quaternion to conform to the API for Matrix4.compose. After calling this method, we get a matrix that represents the combined transformation for our object.

Step 3.2: Create a model view matrix

var modelViewMatrix = new THREE.Matrix4().multiplyMatrices(
  camera.matrixWorldInverse,
  matrixWorld
);

Next we need a matrix that represents the transformation of the child relative to the position and rotation of the camera. In other words, this is how the camera sees the child. We call this a modelViewMatrix, composed by multiplying the inverse of camera.matrixWorld by the matrixWorld we just composed for the child.

var uniforms = {
  modelViewMatrix: modelViewMatrix,
  projectionMatrix: camera.projectionMatrix,
  color: child.material.color
};

The modelViewMatrix and projectionMatrix, alongside material.color will be used in our shaders as uniforms. Uniforms are values that are globally accessible in both the vertex shader and the fragment shader, which will come into play in the next step. We will also need the vertices and indices we set up earlier.

Step 3.3: Render the child face by face

var vertices = child.geometry.vertices;
var indices = child.geometry.indices;
var indexCount = indices.length;
var faceCount = indexCount / 3;
    
for (var i = 0; i < faceCount; i++) {  // STEP 3.3.1: Retrieve vertices for this face  var vertex0Index = indices[i * 3 + 0];
  var vertex0 = vertices[vertex0Index];
  
  var vertex1Index = indices[i * 3 + 1];
  var vertex1 = vertices[vertex1Index];
  
  var vertex2Index = indices[i * 3 + 2];
  var vertex2 = vertices[vertex2Index];  // STEP 3.3.2: Project vertices using vertex shader
  
  var projectedVertex0 = this.applyVertexShader(
    child.material.vertexShader, 
    uniforms, 
    vertex0
  );  var projectedVertex1 = this.applyVertexShader(
    child.material.vertexShader, 
    uniforms, 
    vertex1
  );  var projectedVertex2 = this.applyVertexShader(
    child.material.vertexShader, 
    uniforms, 
    vertex2
  );  // canvas 2D render logic
}

The code above renders a mesh face by face. First, the vertices corresponding to the current face are retrieved from geometry.vertices based on the indices array. Then material.vertexShader is executed once for each vertex in the face, returning a projected point in Normalized Device Coordinates.

Once all vertices in a face are projected, the area of the screen covered by this face is determined. Now the fragment shader takes over. The fragment shader is executed once for each pixel (fragment) in the area of the screen covered by the face. I couldn’t quite figure out how to effectively emulate this process, so we will just use the canvas 2d API to fill the shape using material.color.

this.ctx.fillStyle = this.applyFragmentShader(
  child.material.fragmentShader, 
  uniforms
);
this.ctx.fill();

What is a shader, anyway?

In the section ahead we will be looking at shaders and GLSL. If you want an introductory look into GLSL itself, check out this post.

WebGL shaders have two distinct but tightly coupled parts: a vertex shader and a fragment shader. These are essentially functions that can be injected into the graphics pipeline. Below is a very basic vertex shader written in GLSL.

attribute vec4 position;
uniform mat4 modelViewMatrix;
uniform mat4 projectionMatrix;
  
void main() {
  mat4 modelViewProjectionMatrix = 
    projectionMatrix * modelViewMatrix;
  
  gl_Position = position * modelViewProjectionMatrix;
}

This shader first calculates a modelViewProjectionMatrix using the supplied uniforms. This matrix represents the transformation for a vertex as seen by the camera, projected on a flat surface (the screen). This is the combined transformation for all of the steps we have taken so far. The shader then multiplies the vertex position attribute by this matrix, which gives us the position of this vertex on screen. gl_Position is a reserved keyword in GLSL, representing this final output value.

Each vertex has a different position, but the transformation represented by the matrix is the same (because it it based on uniforms). When this transformation is applied, each vertex is therefore transformed in the same way, so the shape (relative position of the vertices) of the mesh does not change. This is called an affine transformation.

Now let’s look at a JavaScript emulation of this shader. Below you can see each line and its JavaScript approximation.

function basicVertexShader() {
  // attribute vec4 position;
  var position = this.attributes.position;
  // uniform mat4 modelViewMatrix;
  var modelViewMatrix = this.uniforms.modelViewMatrix;
  // uniform mat4 projectionMatrix;
  var projectionMatrix = this.uniforms.projectionMatrix;
  
  // mat4 modelViewProjectionMatrix = 
  //   projectionMatrix * modelViewMatrix;
  var modelViewProjectionMatrix = 
    new THREE.Matrix4().multiplyMatrices(
      projectionMatrix,
      modelViewMatrix
  );
  
  // gl_Position = position * modelViewProjectionMatrix;
  return position.applyMatrix4(modelViewProjectionMatrix);
}

Since JavaScript does not support operators for custom objects, we need to use the Three.js math API to multiply our matrices and vectors.

Now let’s examine the applyVertexShader method we used earlier to project our vertices.

var projectedVertex0 = this.applyVertexShader(
  child.material.vertexShader, 
  uniforms, 
  vertex0
);...function applyVertexShader(shader, uniforms, vertex) {
  var context = {
    attributes: {
      position: vertex.position
    },
    uniforms: uniforms
  }  return shader.apply(context);
}

To emulate how real shaders work, I created a context object which stores the values for attributes and uniforms expected by the shader. The shader is then executed against this context (using this to retrieve the supplied values).

While not exactly accurate, this is a fair representation of how shaders work on a high level. In JavaScript terms, shaders are stateless functions with a particular syntax for inputs and outputs. Inputs are represented by uniforms (which are the same for each execution of the shader within a single draw call) and attributes (which vary for each vertex). The output for the vertex shader is represented by gl_Position, which is essentially the return value of the shader.

This process is similar for the fragment shader, which is used in the next step of the pipeline. The fragment shader does not have access to vertex attributes, but it does have access to the uniforms. This is however where my emulation tapers off. You can see the complete code here, with some additional info in the comments.

If you are inclined to dig into the code and experiment some more, I created a little CodePen project that can be your playground. It’s mostly the same as discussed in this post, but the scene graph is a little more fleshed out, and there are utility classes to create different plane and polygon geometries.

There is much more to the graphics pipeline, but I hope this overview has given you new insight into how some important bits fit together. In the next post, we will dive deeper into vertex shaders, and examine how we can make use of the graphics pipeline to maximize performance.