Porting Smallpt with Visionaray

Stefan Zellmann
18 min readJan 17, 2018

--

Smallpt is a path tracing renderer written by Kevin Beason in 99 lines of C++. Let’s reimplement smallpt using the Visionaray ray tracing template library.

  • We’ll use Visionaray’s builtin naive path tracing kernel for global illumination rendering.
  • Our implementation will be cross platform and can run on NVIDIA GPUs and on CPUs.
  • Since Visionaray is a template library, it is trivial to provide two implementations using either 32-bit single precision or 64-bit double precision floating point arithmetic.
  • The tutorial is kept very simple by intention. It shows how to use Visionaray’s basic types and algorithms, and how to put a cross-platform application together. It also illustrates Visionaray’s data management philosophy — the user maintains all the data and has to copy it to the GPU himself.
  • We will learn how to write a custom camera class with custom logic to generate primary rays, and how to use Visionaray’s material system along with simple geometric primitives.

Smallpt uses spheres as the only 3-D primitive type, and so will our application. The walls of the Cornell Box appear as if they were flat, but they are actually made of huge spheres.

View from the outside — the walls, floor, and ceiling of the Cornell Box are actually huge spheres. The front sphere was removed for better clarity, and white ambient lighting was added so the spheres don’t appear black on the outside, but with their actual material color.

Smallpt uses a purely diffuse material for the walls, and perfect specular materials for the mirror and glass sphere. A sphere sticking out of the ceiling serves as a light source and has an emissive material assigned to it.

The original smallpt program uses a naive path tracer with five bounces and russian roulette after that to evaluate the rendering equation. We are going to use Visionaray’s builtin utilities to mimic smallpt as close as possible; however, our implementation will differ with regards to a few details. We e.g. don’t use russian roulette path termination, and some of the materials are implemented in a slightly different way — for example, the glass material from smallpt uses the Schlick approximation to the Fresnel equation for unpolarized light, while Visionaray’s builtin glass material uses the actual Fresnel equation. Smallpt also uses supersampling to remove the jaggies at the edge of the light emitting sphere, which we will omit for brevity.

Setting up the Scene

Visionaray already provides all the facilities to specify the smallpt geometry and material descriptions. All we have to do is create a bunch of spheres, assign material properties to them, and store the spheres and materials in memory so we can later render them.

So as always we start out with some includes:

// Setup
#include <visionaray/math/sphere.h>
#include <visionaray/material.h>
using namespace visionaray;

Then, in order to create a sphere, we issue the following:

// Add the left sphere
vec3d center(1e5+1.0, 40.8, 81.6); // constants from smallpt
double radius = 1e5;
basic_sphere<double> sphere(center, radius);
sphere.prim_id = 0;
sphere.geom_id = 0;

Note how we assign unique ids to the sphere primitive. prim_id refers to the id of the primitive in a geometry group. geom_id refers to the id of the geometry group itself. prim_id should be unique among all primitives, while geom_id may be shared so that a group of primitives share properties (like e.g. texture images or material properties).

Note also how we use double precision to specify the geometry, since this is the way the original smallpt implementation does. However, rendering with double precision is usually wasteful. This article here, which is about porting smallpt to SYCL, provides us with the right constants to setup the sphere dimensions so that we can render with 32-bit floating point precision without getting nasty rendering artifacts. So with single precision (and with the correct constants), the code for the sphere representing the right wall would look like this:

// Add the right sphere
vec3f center(-1e4+99.0f, 40.0f, 81.6f);
float radius = 1e4;
basic_sphere<float> sphere(center, radius);
sphere.prim_id = 1;
sphere.geom_id = 1;

Note how the only difference is that we use float and vec3f rather than double and vec3d for radius and center, and that we use float as the template parameter for basic_sphere instead of double. (Of course we assign new unique ids since this is another sphere, too). Now, in order to be able to use the spheres later, let’s store them in a vector (could be any other forward-iterable data structure, a static array, or whatsoever — std::vector is a convenient data structure and about right for this simple example). Let’s also slightly rewrite the code from above so we can easily switch between double and float at compile time:

#if USE_DOUBLE_PRECISION
typedef basic_sphere<double> sphere_type;
#else
typedef basic_sphere<float> sphere_type;
#endif
std::vector<sphere_type> spheres;// Create the spheressphere_type left(center_left, radius_left);
left.prim_id = 0; left.geom_id = 0;
spheres.push_back(left);
sphere_type right(center_right, radius_right);
right.prim_id = 1; right.geom_id = 1;
spheres.push_back(right);

So now that we have the spheres all set up and going, we can add material properties to them. We will also store the materials in an std::vector. We will actually store one material per sphere, so that the sphere and material vectors will have the same size. With the unique ids, it is however possible to have a material library with only very few materials, and the primitives referring to them using their geom_id.

Visionaray’s material handling is quite unique. Since Visionaray is cross platform, we cannot use polymorphic materials like with a Material base class, different material classes inheriting from that implementing functions like shade() or sample(), and then storing pointers to the Material base class in our vectors. This doesn’t work because CPUs and GPUs don’t share address spaces — so if we set up our data structures on the CPU, and then later copy them over to the GPU, the pointers will be invalid over there (and, in addition, GPUs usually only have very limited support for runtime polymorphism and virtual functions, e.g. virtual function calls upon objects stored in DDR video memory are usually prohibited by the compute APIs).

So if we only had a single material type, we could just do something like this:

typedef plastic<double> material_type;
std::vector<material_type> materials;
material_type m(...);
materials.push_back(m);

However, smallpt uses four different types of materials, so in order to store all the materials in a single material library, we can use compile time polymorphism, which Visionaray exposes through a variant template data type. The specifics of this are explained in the paper that can be found here. The following code just shows how to use compile time polymorphism with Visionaray for statically typed materials (for single precision, we would just exchange double for float wherever applicable).

#include <visionaray/generic_material.h>// Statically determine all the material types we will encounter.
// Then build up a data structure that can be assigned any type
// from this set.
typedef matte<double> type1;
typedef emissive<double> type2;
typedef mirror<double> type3;
typedef glass<double> type4;
typedef generic_material<type1, type2, type3, type4> material_t;// Storage for the materials
std::vector<material_t> materials;

Now, in order to add the purely diffuse material for the left wall, we can simply write:

matte<double> left_material;
left_material.cd() = from_rgb(vec3d(0.75, 0.25, 0.25));
left_material.kd() = 1.0;
materials.push_back(left_material);

This code needs some explanation. cd() returns a reference to the diffuse color of the material, which is of type spectrum<T> (so in our case spectrum<double>). To convert an RGB color to a spectral power distribution, we use the function from_rgb(). With kd() we can access the material’s scaling factor, which is there for convenience. We simply set it to 1.0. This scheme is similar for all material types. The plastic<T> material e.g. has a diffuse (cd() and kd()) material property, as well as a specular one (cs() and ks()). Materials may have additional properties, plastic e.g. has a specular exponent assigned to it.

So the code for adding the glass sphere would look something like this:

glass<double> glass_material;// Transmissive color
glass_material.ct() = from_rgb(0.999); // r=g=b=0.999
glass_material.kt() = 1.0;
// Reflective color
glass_material.cr() = from_rgb(0.999);
glass_material.kr() = 1.0;
materials.push_back(glass_material);

Note how we add the glass material to the same std::vector as we did with the diffuse material for the left wall. Note also that the order in which we add the materials matters. All spheres that have geom_id equal to 0 will have assigned the first material in the vector to them — geom_id is an index into the material list (note that this requires the material list to be a sequential type like std::vector or an ANSI-C array etc.). So in order to have the left sphere assigned the glass material to it, we will need to assign its geom_id the value 2, since glass_material was the 3rd material we added to the materials vector.

Setting up the spheres and materials for the remaining walls, the floor, and the ceiling is conceptually similar. Node that the sphere sticking out of the ceiling has an emissive material assigned to it, with its emissive color component (ce()) set to the rbg triple (12, 12, 12). This emissive sphere serves as a light source, without it our rendered image will be completely black. Check out the example source code on github for the complete geometry setup.

Prepare for Path Tracing

Now that we have set up all the data instances, let’s render them with Visionaray. We’ll start out with the render mode that runs on the CPU. We will use Visionaray’s builtin path tracing implementation for rendering, which is implemented as a Visionaray kernel. Kernels are functions or function objects that are passed a single ray (usually a primary ray originating from the camera), perform some type of traversal, and in the end return a result record containing values like radiance integrated over the path of the ray, or an (OpenGL-compatible) depth value for the first hit. Kernels are usually user-written, but Visionaray comes with a handful of builtin kernels that may or may not be sufficient for some users. The Visionaray path tracing kernel implements naive path tracing as described in the original paper by James Kajiya. We therefore first set up a data structure that contains the scene data to be rendered, so we can later pass it to the kernel:

#include <visionaray/kernels.h>
//...
auto kernel_params = make_kernel_params(
spheres.begin(), // iterator to first primitive
spheres.end(), // iterator to last+1 primitive
materials.begin(), // iterator to materials
lights_begin, // we need lights!
lights_end, // ..pass anything here!
num_bounces, // bounces along path
vec4(0.f), // rgba background color
vec4(0.f) // rgba ambient light color
);

The parameter data structure is created with the make_kernel_params utility function. This function takes forward iterators to the primitives and materials (no end-iterator needed for the materials since materials are not traversed). make_kernel_params also requires the user to specify iterators to the light sources. We don’t really have light sources in our tiny example (the emissive sphere could be treated as an area light and used for next event estimation, but that is a different story), so we basically can put anything in here, like

struct ignore {};
ignore *lights_begin = nullptr, *lights_end = nullptr;

(Note again how pointers fulfill the C++ ForwardIterator concept and can thus simply be passed to the kernel for traversal.) The remaining parameters are the number of (reflective, refractive,…) bounces the path tracing kernel will perform until terminating or hitting a light source; the background color in case not even the primary ray hit an object, and an ambient light color that can be used for simple lighting in case of the absence of any actual light source or light emitting material.

The reason we set up the parameters data structure first is that we can now create a kernel instance based on the type of this object:

// Use C++11 decltype() to find out the type of kernel_params.
// The actual type is a complicated template that we
// don't want to write out by hand..
using parameters_t = decltype(kernel_params);// Now we know how to instantiate and create an object
// of the path tracing kernel:
pathtracing::kernel<parameters_t> our_kernel;// We immediately pass the parameters on to the kernel
our_kernel.params = kernel_params;

Now that we have set up the path tracing kernel with the scene data (which may generally change on a per-frame basis), we have to use a Visionaray scheduler to call the kernel for rendering. Depending on the device we are rendering on, we need the right scheduler. For rendering on the CPU, we use the multi-threaded builtin scheduler coming with Visionaray:

tiled_sched<ray_type> sched;

But wait, what is the ray type? Well, that depends on if we want to do double precision (typedef basic_ray<double> ray_type;) or single precision (typedef basic_ray<float> ray_type;) ray traversal. Since everything is a template, a real-world application would choose the appropriate type (probably float) and stick with it — our implementation could support both modes using the USE_DOUBLE_PRECISION compile time switch from above.

So the way Visionaray’s schedulers work is that we call their member function scheduler_type::frame(), and frame() will generate primary rays, call our_kernel for each primary ray, and write the traversal result (e.g. radiance converted to rgb(a), and pixel depth), to a render target. So before we can call sched.frame() in our case, we have to perform some setup first.

Implementing a Custom Camera Type

In order to generate primary rays, we need a virtual camera that specifies how those rays are created. Visionaray comes with two builtin cameras: pinhone_camera (#include <visionaray/pinhole_camera.h>) generates primary rays that originate from a common origin; matrix_camera (#include <visionaray/matrix_camera.h>) is useful when we already have modelview or projection camera matrices, e.g. coming from a graphics API, and want to generate primary rays that adhere to those matrices.

Since smallpt uses a hardcoded perspective camera description, so will we. By writing a custom Visionaray camera, we will learn how to extend Visionaray with classes for primary ray generation.

As with anything in the Visionaray world, camera classes must adhere to a C++ concept, in this case the Camera concept specified by Visionaray: any class that implements the functions begin_frame(), end_frame(), and primary_ray() (with a specific function parameter list) is eligible to be used as a camera and is then compatible with Visionaray’s schedulers.

So our smallpt camera will look as follows:

struct smallpt_camera
{
void begin_frame() {}
void end_frame() {}
ray_type primary_ray(ray_type /* */, double x, double y,
double width, double height)
{
vec3d eye(50, 52, 295.6); // constants from smallpt
vec3d dir = normalize(vec3d(0, -0.042612, -1));
// Vectors that are interpolated across the screen
// in x- and y direction. Usually this involves
// the tangent of the field of view angle to determine
// the stepping delta (hardcoded in smallpt)
vec3d cx(width * 0.5135 / height, 0, 0);
vec3d cy = normalize(cross(cx, dir)) * 0.5135;
// The ray direction vector
vec3d d = cx * ((cx - width/2) / width)
+ cy * ((y - height/2) / height)
+ dir;
// Construct and return ray from origin and direction.
// The rays origin is translated by 140 in ray direction
// (hard coded constant from smallpt)
return ray_type(eye + d * 140, d);
}
};

The actual implementation of primary_ray()is not really so important, it is pretty straight forward, yet some values are hard coded which doesn’t exactly help legibility. However, what’s noteworthy about this piece of code is that the parameters x, y, width, and height passed to primary_ray(), which refer to the pixel position to sample, and to the dimensions of the image to render, are floating point values rather than integers. This is because the scheduler, which will eventually call the primary_ray function, treats pixels as tiny rectangles having an area, and potentially samples the pixel at a position slightly off the center position, depending on how we set up the scheduler. So the x and y coordinates we use to calculate the ray direction are potentially already jittered.

If we want our class to be compatible with double- and single precision, we would rather make primary_ray() a template like this:

template <class R, class T>
R primary_ray(R /* */, T x, T y,
T width, T height);

From that you can see why the (unused) first parameter of type R is necessary — Visionaray uses it to perform C++ tag dispatching; the parameter is passed to the function so Visionaray knows the type of the ray we want to generate.

Our implementation leaves the begin_frame() and end_frame() functions unimplemented because in our case it is not necessary to perform any pre- or post-rendering tasks — this may however be necessary for other camera types, and this is why the functions are there.

Prepare for Rendering

So now that we have the rendering kernel set up, and we have implemented a camera class to generate primary rays, we are almost good to go so we can render a frame. The only thing we need to take care of is a render target that the traversal result is being stored to.

Visionaray render targets are an abstraction for some linear memory regions — they provide functions color() and depth() returning pointers to the memory region that stores pixel colors and pixel depth values. Now, a memory region can be basically anything from a color buffer on a GPU to an std::vector stored inside the render target, or a memory location in an open file. Some of the render targets coming with Visionaray have a function display_color_buffer(), so their color buffer can be presented in an efficient way — this is however not a necessity. This tutorial will not go into the details about render targets — they are the topic for a whole tutorial on their own — however, they can also be user-written and provide a simple abstraction so that traversal results can immediately be written to any memory location.

For our purpose, we will use the cpu_buffer_rt<ColorFormat, DepthFormat> render target which internally stores colors and depth values in vectors, and allows to present the color buffer with OpenGL:

#include <visionaray/cpu_buffer_rt.h>cpu_buffer_rt<PF_RGBA32F, PF_UNSPECIFIED> rt;

Our render target uses 32-bit single precision floating point values for each rgba color component (this is necessary because our path tracer will perform compositing of individual frames in the memory region encapsulated by the render target), and will not store depth values at all, because we simply don’t need them. With that, we have everything up and ready to render a single frame:

int w = 1024, h = 768;smallpt_camera cam;

rt.resize(w, h);
// Pack the camera and render target into
// a parameter struct as well
auto sched_params = make_sched_params(
pixel_sampler::jittered_blend_type{},
cam, rt);
// Render the frame
sched.frame(our_kernel, sparams, ++frame_num);
// Present the frame (we need an OpenGL context here!)
rt.display_color_buffer();

The first thing to note is that we again have a parameter data structure, but this time we don’t pass it to the kernel as a member, but on a per-frame basis (this is because we assume that camera parameters generally — not in our case though — change with every frame) to the scheduler. Also note how we pass an empty pixel_sampler::jittered_blend_type instance to the factory function — this is tag dispatching again, the scheduler will then know that we want the pixel sampling to use jittering, and that we want to blend individual frames on top of each other.

Also note the variable frame_num. This is an integer variable we store somewhere. Presumably, we would store it somewhere where we also store our application state. With that setup, sched.frame() will render a new path tracing frame and blend it over the colors that are already in memory. Resetting it to 0 will result in a new image being generated. This procedure will result in noisy frames getting blended to converge to a non-noisy result. This is how our render looks like after 1, 8, 40, 200, 1,000, and 5,000 frames, respectively (the image on the front matter of this tutorial shows the converged result after blending 25,000 frames):

Smallpt also uses gamma correction to transform the 32-bit floating point color channels into an sRGB color representation. This simple operation can be performed in multiple ways — for simplicities sake, and since our implementation anyway uses OpenGL to present the result images, we resort to using an OpenGL extension to perform gamma correction. This can be achieved by calling glEnable(GL_FRAMEBUFFER_SRGB) provided by the ARB_framebuffer_sRGB extension before calling rt.display_color_buffer().

Porting Our Application to CUDA

So now that we have a working implementation of our interpretation of smallpt up and running on the CPU, what we’re left with is porting the whole thing to the GPU. We will do this using NVIDIA’s CUDA framework and the builtin facilities provided by Visionaray.

As I mentioned initially, Visionaray users are entirely responsible for data handling themselves. That implies that the user also decides when to copy data to and from the GPU. This approach is most flexible: imagine you are running some type of simulation on the GPU — now you want to visualize the result from the simulation — with the user being responsible for maintaining the data, the simulation result can be directly ray traced without having to copy data to host memory and then back to the GPU. This data handling philosophy also implies that Visionaray is most explicit about when data copies occur.

So since we’re responsible to prepare the data for rendering on the GPU, we can just use our favorite API to do so. So a simple way to get the spheres and materials that are already stored in std::vectors is to use the thrust C++ library. Thrust provides a template device_vector<T> that wraps CUDA global device memory. device_vector<T> provides the copy constructor and assignment operators

device_vector::device_vector<T>(std::vector<T> rhs);
device_vector& device_vector::operator=(std::vector<T> rhs);

The two functions will take an std::vector<T> which contains Ts and copy them over from the host to CUDA global memory which resides in the GPU’s DDR video RAM. So in addition to the spheres and materials vectors, our application state needs to maintain additional device data containers d_spheres and d_materials:

#include <thrust/device_vector.h>thrust::device_vector<sphere_type> d_spheres(spheres);
thrust::device_vector<material_t> d_materials(materials);

The code above allocates storage for the spheres and materials, and copies the data to the device.

Now, in order to render our scene on the GPU, we need to create a kernel from the GPU scene. We again create kernel parameters, that now however refer to device memory:

auto kernel_params = make_kernel_params(
thrust::raw_pointer_cast(d_spheres.data()),
thrust::raw_pointer_cast(d_spheres.data()) + d_spheres.size(),
thrust::raw_pointer_cast(d_materials.data()),
lights_begin,
lights_end,
num_bounces,
vec4(0.f),
vec4(0.f)
);

The way we pass iterators to the factory function is noteworthy: thrust::iterators may not be used in CUDA kernels, so we use thrust functions to obtain a CUDA device pointer from the device_vectors.

Note that a completely viable, alternative implementation could be comprised of directly storing CUDA device pointers instead of the device_vectors and using functions like cudaMalloc(), cudaFree(), or cudaMemcpy() to copy the data from the host to the GPU. Especially in cases where the data is already persistent on the GPU, this might be the only way to avoid unnecessary host/device data transfers.

Creating the path tracing kernel happens analogously to the CPU case, now that we can infer the type of the kernel_params object. Note however that in general the types of the device kernel_params and the host kernel_params will differ, and so will the instantiated type of the path tracing kernel template.

In order to render our images on the GPU, we also need special schedulers and render targets: cuda_sched<ray_type> will generate primary rays using a CUDA kernel rather than C++ threads. gpu_buffer_rt<ColorFormat, DepthFormat> is a render target that wraps GPU DDR video memory (using device_vectors). Instead of using gpu_buffer_rt (which is used analogously to cpu_buffer_rt from above), we however opt to use a more special render target:

#include <visionaray/pixel_unpack_buffer_rt.h>pixel_unpack_buffer_rt<PF_RGBA32F, PF_UNSPECIFIED> d_rt;

pixel_unpack_buffer_rt uses CUDA/OpenGL interoperability to directly render into an OpenGL pixel buffer object from a CUDA kernel. This is useful when we want to immediately display the rendered image on the GPU without an extra device to host, followed by a host to device copy. CUDA and OpenGL share buffers, and the render target manages synchronization so that the buffers are correctly bound when being accessed by the respective API.

At last we need to make a simple modification to the smallpt_camera, where we need to annotate all functions as being callable from both CPU and GPU:

struct smallpt_camera
{
__host__ __device__ void begin_frame() {}
__host__ __device__ void end_frame() {}
template <class R, class T>
__host__ __device__ R primary_ray(R /* */, T x, T y,
T width, T height);
};

With those modifications applied, the final rendering code looks like this:

// Like before, but d_kernel has a different type
// than our_kernel from above!
using parameters_t = decltype(kernel_params);pathtracing::kernel<parameters_t> d_kernel;
d_kernel.params = kernel_params;

// Now perform frame rendering
int w = 1024, h = 768;smallpt_camera cam;

d_rt.resize(w, h);
auto sched_params = make_sched_params(
pixel_sampler::jittered_blend_type{},
cam, d_rt);
// Render the frame
d_sched.frame(d_kernel, sparams, ++frame_num);
// Present the frame with an OpenGL PBO
d_rt.display_color_buffer();

Results

So after porting to the GPU, we are interested in the performance of our code. I therefore timed several runs both on my Mid 2014 MacBook Pro, as well as on a Linux graphics workstation equipped with an NVIDIA GTX 1080.

The following are average execution times for the two setups, for both the single precision and double precision code paths:

Workstation:

  • Intel Core i7–3960X, 3.3GHz, 6 cores, 12 threads with hyperthreading
  • NVIDIA GTX 1080
        GPU          CPU
float: 0.033 sec 0.244 sec
double: 0.083 sec 0.342 sec

MacBook Pro:

  • Intel Core i7, 2.8GHz, 4 cores, 8 threads
  • NVIDIA GeForce GT 750M
        GPU          CPU
float: 0.131 sec 0.425 sec
double: 0.636 sec 0.506 sec

Conclusions

This tutorial showed how to port smallpt with the Visionaray API. Well, smallpt is — for sure — not a particularly useful application. However, by porting smallpt, one can learn a variety of concepts imposed by the Visionaray API. Switching between double and single precision is achieved by simply changing template parameters. GPU data handling must be explicit, APIs like thrust can help with this and provide a convenient interface that maps well to Visionaray’s internal data structures and requirements. By porting smallpt to Visionaray, we also got a glimpse at internal types like compile-time polymorphic materials or special render targets to use with CUDA and OpenGL.

As always, a complete version of the source code for this tutorial can be found on github: https://github.com/szellmann/visionaray/tree/master/src/examples/smallpt

--

--