Ray Tracing Adventures Part II : Acceleration Structures

10 min readNov 14, 2022

In the previous post we created a decent raytracer capable of rendering cool scenes with some basic effects. However, it wasn’t that performant and some scenes with high triangle & mesh count took longer than hours to render. We also had no way of positioning and orienting the models apart from what is described by their vertex data. In this part, we will address those issues.

As an overview, Bounding Volume Hierarchy (BVH) and Multithreading will be used to gain significant runtime performance boost. Time is not the only resource we try to optimize and so we will also implement CPU Mesh Instancing to drastically reduce memory overhead of using the same model multiple times.

Lastly, to be able to design scenes at ease, and also create the possibility of animations, 3D transformations will be introduced.

Let’s start our journey by implementing a simplistic BVH.

Bounding Volume Hierarchy

BVH is a special tree data structure that organizes “primitives”, in our case triangles, in a way that is useful to perform many intersection tests.

All geometric objects, that form the leaf nodes of the tree, are wrapped in bounding volumes.

We use AABB or Axis Aligned Bounding Box as our bounding volume as it is simple to define & store and is very fast to run intersection tests against.

Creating an AABB for each triangle is simple enough, we just find the maximum and minimum values in each axis and store them. Thus, an ABBB is compactly defined by just 6 float values.

After each primitive has it’s own AABB, we are ready to form the BVH tree.

The main idea is to partition objects and group them in an hierarchical manner. Let’s take a look at how it looks for a scene with only 6 triangles. Here, notice that green and red triangles, or more correctly, their bounding boxes are grouped into larger bounding box enclosing the two.

This helps us since if a ray does not intersect that higher level bounding box, then it is guaranteed that it does not intersect any of the lower-level bounding boxes, and definitely does not intersect the triangles that are within those. Thus, we can perform one ray versus AABB intersection test, which is quite cheap, and discard many triangles, and even whole mesh structures with millions if triangles inside. That is the core improvement that we are after.

Here is an overview or a pseudo-pseudo code of the whole construction process for a top-down built BVH.

I will not go into much detail but shortly discuss each step instead.

Noting that this is a top-down process, we start with one huge bounding box that wraps everything, and gradually divide that into sub-boxes that are smaller. The obvious question is how to divide that box. We will simply select the largest extent of our box. But this only selects the axis of x,y and z. We actually need a value to compare against along that axis.

Let’s say our box was very tall, so y extent was the highest and decided on splitting on y-axis. Now, we need a y-value that we will compare against the y value of center point for each triangle in the scene. Then we create two disjoint sets(a triangle must belong to only one of the sets) depending on whether their y value of center point is higher or lower than the selected point.

Lastly, we build a bounding box for each of the two groups and continue to divide recursively.

Selection of that split point has serious performance implications, however this project implements the simplest possible one, by just selecting the middle point.

Here is a snippet of code of selecting the split axis and point:

Selecting x-axis and the middle point on it.

The end result might look like this:

BVH, retrieved from: https://en.wikipedia.org/wiki/Bounding_volume_hierarchy

Let’s quickly take a look at multithreading before rendering some complex scenes.

Adding basic multithread support is easier than it might sound at first.

Since the rendering of each and every pixel is independent from each other, and thus free from any kind of synchronization issue, we can simply assign a separate set of pixels to each thread.

It’s time for rendering some detailed scenes!

4K Render of a model with 1.8 Million triangles.

15 seconds for a 4K render with reflective & conductive materials.

Same scene with a more modest resolution of 1920x1080 and a different camera angle.

1.8M triangles, @1920x1080, 2.4 seconds.

A detailed Lobster model with reflections on the ground.

Lastly, a simple test case, 0.26 seconds for a 800x800 image.

We are now able to render complex scenes at great resolution, in a reasonable time. There is definitely much room for improvement in our BVH implementation to make this even faster, but that will have to wait.

Now let’s work on manipulating the geometry via transformations so that we can design more interesting scenes with ease.

3D Transformations

We will support all the modelling transformations such as translation, scaling and rotation. There is also viewing transformations that are used in a forward rendering pipeline, but we won’t need it in our raytracer.

Until now, we were only able to move the camera around by changing its parameters in our scene file. Moving a complex object, though theoretically possible by manually modifying each and every vertex it has, is impossible to do in practice. Instead, we will get help from mathematics and matrices.

Each transformation, and their combinations which are called “Composite Transformations” can be represented by 4x4 matrices, and the application of a transformation can simply be matrix multiplication. This of course does not happen by itself and there are certain structures to create liek homogenous coordinates, however clever mathematicians already beat us to this task.

Here is how the matrices look like for a 2D transformation, 3D is analogous but more verbose.

Here is a code snippet that iteratively computes the composite transformation matrix, here named “transform” for a matrix. Only the translation part is shown.

Combining transformations into a composite.

Matrix class is a helper class that handles generic matrix operations as well as computer-graphics specific things like creating translation, scaling and rotation matrices, and also their application to vectors and points.

Now, in a framework like OpenGL, this transform matrix, or Model matrix as called there, is passed on to the GPU and each vertex of the model is multiplied with it.

This process is costly to perform on CPU, and in our case is also mostly unnecessary. We have a more interesting solution and that is to inverse transform the ray we cast, into the space of an object that we are currently testing for intersection.

Inverse transform is normally computed from the transform matrix by taking the matrix inverse, however our simplistic Matrix class does not have that and so I sidestepped the issue by creating the inverse on the fly. This was only possible since we are never given a ready-made model matrix, but instead we build it piece by piece by combining the given transformations. Creating their inverses is straightforward, just negating the input values for rotation and translation, and dividing instead of multiplying for scaling.

Then we combine those inverse pieces in the reverse order that we do for the regular transform matrix. This works due to the matrix inverse property from Linear Algebra:

Distribution of inverse operation

Please note that in a real life scenario one would use libraries such as glm to handle operations like taking an inverse, we created our own matrix class here only for educational purposes.

Now, we will still use the model matrix, but only transform the top-most bounding box, the all-encapsulating one, of each mesh, and if the ray intersects that, then we will transform the ray to object’s transformed space and continue there, without touching the mesh vertices at all.

Note that we safeguard the original direction and origin of the ray, and restore that state after we are done intersecting, since this ray will possibly be tested for intersections with other objects after this one. So our transformation of the ray is temporary.

Let’s see how that looks in code. Here is the intersection test for instanced meshes.

Let’s put that to use. We have a berserker model, and without any transformations it appears as in the right. After applying some rotation, transformation and scaling we obtain the one in the left, using exactly the same vertex data!

Let’s take a look at some weirds scenes that use instancing and chain-transformations.

Multiple coloured lights, instanced dragons with accumulating transformations rendered @4K, first one in 16:9 ratio, second in 4:3.

Transformations are great and all, but we still need to talk about Mesh Instancing that we have used to create the scenes above.

Mesh Instancing

Mesh instance is a special kind of mesh in that it has no geometry data by itself.

It refers to another mesh that is either a regular mesh (with geometry data) or another Mesh Instance!

It can define it’s own transformations and material if wants to. Furthermore, via the “resetTransform” attribute mesh instance can “build on top” off the transformations applied to its base mesh. This is particularly useful to create scenes such as The Great Dragon Spiral above, by applying the same transformation repeteadly.

It’s basically a way of saying “whatever transformations my base mesh is doing plus these”.

Worth noting, mesh instances do not have their own BVH and can use the BVH built for the base mesh once the ray is correctly transformed into the transformed space of the mesh instance. They do, however, have a single bounding box that is in world space. This way, we can check for the top-most bounding box intersection without the need of ray transformation, and only transform it if it intersects with this encapsulating box.

One small but important point to make is that transforming the bounding box is not straightforward. One would think that applying the transformation matrix to the max and min corners of the box is enough. This does not work all the time as after a certain rotation and/or combinations, box may not be axis aligned anymore.

The previously maximum corner, may not be the correct maximum after the transformation. Remember that the bounding box is indeed a 3D box with 8 corners, but we were only storing the 2 corners that represent the max and min values of the box. However, to correctly calculate the AABB after transformation, one needs to transform all 8 corners of the box, and select the max and min again.

Otherwise we might obtain results such as these. The black box visualizes the bounding box after an incorrect transformation.

Correct bounding box transformation saves the day.

Programmatically, an Instanced Mesh can be implemented as a wrapper around Mesh.

It does not own any geometry like vertex or face data and will redirect most operations to its base mesh, but of course it has it’s own transformation matrices which is inherited from the Shape.