Creating a 3D Model Previewer with Web Assembly and WebGPU

10 min readFeb 16, 2024

Currently, the best way to take an image of a 3D model in JavaScript is to use a rendering library such as Three.js and convert the output to an image. However, these libraries tend to have a slow first render due to initialization overhead. A large model can take multiple seconds to render, especially on mobile devices. I figured this would be a perfect use case for Web Assembly, since benchmarks show at least a 1.5x speed improvement compared to JavaScript. In this article I will explain how I created the the wasm-stl-thumbnailer package, as well as a performance analysis.

NPM: https://www.npmjs.com/package/wasm-stl-thumbnailer

Github: https://github.com/adamgerhant/wasm-stl-thumbnailer

Demo site: https://adamgerhant.github.io/wasm-stl-thumbnailer/

What is Web Assembly

Unlike Javascript, which is interpreted as it is run, Web Assembly is pre-compiled from a low level language such as Rust or C++. Because it is pre-compiled, it is generally faster than JavaScript since the machine code can run at near native speed on the CPU. This compiled package is essentially a “black box” where data is inputted and an output is returned, which is perfect for my goal of taking an image from a 3D STL model. Web Assembly can also be used to access the GPU with WebGPU for fast renders.

Starting the Project

In order to learn about WebGPU, I read through Learn Wgpu, which I would highly recommend. For this project, I will start with tutorial 9 — model loading. This tutorial creates a 9x9 grid of cubes from an OBJ file.

Although this may seem close to our goal, there is plenty of work to be done. Here is an outline of all the steps to create the thumbnailer.

The model needs to be passed from JavaScript to wasm as a byte stream, since wasm generally cannot access the file system
The render should be windowless, and return a png byte stream back to JavaScript.
The model rendering needs to be adapted for STL files instead of OBJ
Lighting, camera position, and scaling need to be added.

Inputs and outputs

The following Rust function defines the inputs and outputs of the program, which are both u8 byte streams that will contain the stl and png data.

#[wasm_bindgen]
pub async fn stl_to_png(stl_bytestream: &[u8]) -> Result<JsValue, JsValue>{
    console_log::init_with_level(Level::Debug);
    let render = render(stl_bytestream).await; 
    let uint8_array = Uint8Array::from(render.as_slice());
    let js_value: JsValue = uint8_array.into();
    Ok(js_value)
}

The #[wasm-bindgen] will instruct the compiler to generate the necessary glue code to make that function callable from JavaScript. Since it requires a u8 from JavaScript, the file needs to be converted before passing it to the function. Since the render is asynchronous, a result must be returned. Once the png byte stream is returned from wasm, it is converted to an image, and appended to the document.

const file = event.target.files[0];
try {
  const arrayBuffer = await file.arrayBuffer();
  const uint8Array = new Uint8Array(arrayBuffer);

  const pngByteStream = await wasm.stl_to_png(uint8Array);

  const blob = new Blob([pngByteStream], { type: 'image/png' });
  const dataUrl = URL.createObjectURL(blob);
  const img = document.createElement('img');
  img.src =dataUrl;
  document.body.appendChild(img);

} catch (error) {
  console.log(e)
}

Windowless Render

Normally, the GPU outputs image data directly to a window. However, for this project the data needs to be passed back to memory so that it can be returned to JavaScript. In order to do this, a buffer is created on the GPU to store the render data. Then, the copy_texture_to_buffer instruction is called by the encoder, which sends instructions to the GPU. Finally, the buffer data is copied to memory with flume, an asynchronous channel library.

//the buffer needs to store every pixel as a u32 for rgba data
let output_buffer_size = (u32_size * texture_size * texture_size) as wgpu::BufferAddress;
let output_buffer_desc = wgpu::BufferDescriptor {
    size: output_buffer_size,
    usage: wgpu::BufferUsages::COPY_DST | wgpu::BufferUsages::MAP_READ,
    label: None,
    mapped_at_creation: false,
};
let output_buffer = device.create_buffer(&output_buffer_desc);

//... create encoder and send GPU instructions to render model

//copy the render to the buffer
encoder.copy_texture_to_buffer(
  wgpu::ImageCopyTexture {
      aspect: wgpu::TextureAspect::All,
      texture: &texture,
      mip_level: 0,
      origin: wgpu::Origin3d::ZERO,
  },
  wgpu::ImageCopyBuffer {
      buffer: &output_buffer,
      layout: wgpu::ImageDataLayout {
          offset: 0,
          bytes_per_row: Some(u32_size * texture_size),
          rows_per_image: Some(texture_size),
      },
  },
  texture_desc.size,
);

//submit instructions to the GPU
queue.submit(iter::once(encoder.finish()));


let mut pixel_data = Vec::<u8>::new();
//create a CPU-accessible version of the GPU buffer
let buffer_slice = output_buffer.slice(..);
//initalize an asynchonous channel to send and receive buffer data
let (sender, receiver) = flume::bounded(1);
//asynchronously send the buffer data from the GPU to the channel
buffer_slice.map_async(wgpu::MapMode::Read, move |r| sender.send(r).unwrap());
//wait for render to finish
device.poll(wgpu::Maintain::Wait);
//recieve data from channel and add it to pixel_data
receiver.recv_async().await.unwrap().unwrap();
{
    let view = buffer_slice.get_mapped_range();
    pixel_data.extend_from_slice(&view[..]);
}

output_buffer.unmap();

// Convert pixel data to a DynamicImage
let img = DynamicImage::ImageRgba8(ImageBuffer::from_raw(texture_size, texture_size, pixel_data).unwrap());

// Write the image to the buffer as PNG
let mut png_data = Cursor::new(Vec::<u8>::new());
img.write_to(&mut png_data, ImageFormat::Png).unwrap();

let png_bytes = png_data.into_inner();

png_bytes

After modifying the code to process the byte stream instead of the file, as well as a few other changes, this is what the render looks like.

Improving the Render

In order to improve the quality of the render, I will add lighting, scaling, and a better correct camera position. To implement the lighting I will be adding diffuse and ambient lighting, outlined in tutorial 10 — working with lights. Diffuse lighting compares the normal vector (a unit vector pointing perpendicular to each vertex) to the light vector (where the light comes from). The diffuse lighting appropriately shades the faces and provide shadows. Ambient lighting is more simple, and is simply a base level of brightness applied to all faces. The calculations for the lighting are done in the fragment shader, which used to calculate the color for each pixel during the render.

@fragment
fn fs_main(in: VertexOutput) -> @location(0) vec4<f32> {
    
    let ambient_strength = 0.5;
    let ambient_color = light.color * ambient_strength;

    let light_dir = normalize(light.position - in.world_position);

    let diffuse_strength =  dot(in.world_normal, light_dir);
    let diffuse_color = light.color * diffuse_strength;

    let result = (ambient_color + diffuse_color*0.5 ) * 0.5;

    return vec4<f32>(result, 255.0);
}

Next, the model needs to be appropriately scaled. Some models will be larger than others, and could even be offset from the center. Since a 3d model is simply a bunch of vertices (points in 3d space), and indices (connections between vertices to form a face), the dimensions and offset can be calculated and used to scale each vertex. In order to find the center of the model and its maximum dimension, I use the following

fn calculate_centroid(vertices: &[(f32, f32, f32)]) -> (f32, f32, f32) {
    let (sum_x, sum_y, sum_z) = vertices.iter()
        .fold((0.0, 0.0, 0.0), |acc, &v| (acc.0 + v.0, acc.1 + v.1, acc.2 + v.2));
    let num_vertices = vertices.len() as f32;
    (sum_x / num_vertices, sum_y / num_vertices, sum_z / num_vertices)
}


fn calculate_max_dimension(vertices: &[(f32, f32, f32)]) -> f32 {
    let min_x = vertices.iter().map(|v| v.0).min_by(|a, b| a.partial_cmp(b).unwrap()).unwrap();
    let max_x = vertices.iter().map(|v| v.0).max_by(|a, b| a.partial_cmp(b).unwrap()).unwrap();
    let width = max_x - min_x;

    let min_y = vertices.iter().map(|v| v.1).min_by(|a, b| a.partial_cmp(b).unwrap()).unwrap();
    let max_y = vertices.iter().map(|v| v.1).max_by(|a, b| a.partial_cmp(b).unwrap()).unwrap();
    let height = max_y-min_y;

    let min_z = vertices.iter().map(|v| v.2).min_by(|a, b| a.partial_cmp(b).unwrap()).unwrap();
    let max_z = vertices.iter().map(|v| v.2).max_by(|a, b| a.partial_cmp(b).unwrap()).unwrap();
    let depth = max_z - min_z;

    width.max(height).max(depth)
}

Then, it is as simple as looping through the vertices and applying the offset and scaling factor.

let vertices: Vec<(f32, f32, f32)> = m.mesh.positions
    .chunks(3) 
    .map(|chunk| { 
        (chunk[0], chunk[1], chunk[2])
    })
    .collect();

let centroid = calculate_centroid(&vertices);
let offset = (-centroid.0, -centroid.1, -centroid.2);

let max_dimension = calculate_max_dimension(&vertices);
let desired_height = 6.0; 
let scaling_factor = desired_height / max_dimension;

//loop through vertices
  position: [
      (m.mesh.positions[i * 3] + offset.0) * scaling_factor,
      (m.mesh.positions[i * 3 + 1] + offset.1) * scaling_factor,
      (m.mesh.positions[i * 3 + 2]+ offset.2) * scaling_factor,
  ]

Finally, I found some better camera and lighting positions to create the final render. I also added anti aliasing to smooth the edges.

Rendering STL instead of OBJ

The STL file type has various difference compared to OBJ. Most notably, the stl_io library I use to read the the stl calculates normals for faces, not the vertices. This means the vertex normal will have to be calculated from the normalized sum of all faces with that vertex.

let mut vertex_normals = vec![Vector3::zeros(); stl.vertices.len()];

for triangle in stl.iter_faces() {
    let normal = Vector3::new(triangle.normal[0], triangle.normal[1], triangle.normal[2]);

    // Add the normal to the vertex normals of the triangle vertices
    for i in 0..3 {
        vertex_normals[triangle.vertices[i] as usize] += normal;
    }
}

// Normalize the vertex normals
for normal in &mut vertex_normals {
    *normal = normal.normalize();
}

After adapting the rest of the code to use the stl data instead of obj data, this is the result.

Vertex vs Face Normals

Although the bunny model looks great, a problem arises for models with faces which meet at a sharp angle. For example, the following picture shows a render of 2 rectangles.

Since the faces use the same vertex normal (drawn in red), areas on different faces near a vertex will have nearly identical normals. This leads to a very flat render, which makes high polygon models look smoother, but makes it hard to see edges. Since I don’t want low polygon models to look so flat, I duplicate the vertices so that each triangle has unique vertices. This means that each vertex can store the normal for that face.

let mut vertices = Vec::new();
let mut indices = Vec::new();

for (index, face) in stl.faces.iter().enumerate() {
    for i in 0..3{
        let current_index = index*3+i;
        indices.push(current_index);

        let vertice_index = face.vertices[i];
        let current_vertex = stl.vertices[vertice_index].clone();
        let offset_vertex = [
            (current_vertex[0] + offset.0)*scaling_factor,
            (current_vertex[1] + offset.1)*scaling_factor,
            (current_vertex[2] + offset.2)*scaling_factor,
        ];
        let model_vertex = model::ModelVertex {
            position: offset_vertex,
            tex_coords: [0.0, 0.0],
            normal: face.normal.into(),
        };
        vertices.push(model_vertex);
    }
}

This is what the rectangles look like now

The bunny may not look as smooth, but I think the better edge visualization is worth it. I might add a parameter to the package to allow the user to choose between rendering vertex or face normals.

Performance

Now to see if switching to wasm and wgpu was worth it. Since I originally created this package to replace react-stl-viewer, which uses Three.js, that’s what I will compare it to. In order to measure the speed, I use performance.now() to save the time when the file is loaded, and then record the elapsed time after the wasm function and react-stl-viewer finish. The code is available here. The first test is a simple speed test on a variety of model sizes. I refresh the page on every render to measure first render speed, and take the average of 5 renders for each model. For reference my PC has a Ryzen 7 3700x CPU, and an RX 5700 XT GPU. The full data is available on this Google Sheet.

From the data, the wasm package on average improves performance by 12.74%, with large models being improved by 15–20%. Its reassuring to see that the claims of wasm being 10–20% faster than JavaScript seem to be true, at least in this case.

Improving Performance 2–3x with a Simple Optimization

While using the wasm-thumbnailer, I noticed something interesting. The first render after reloading the page takes 2–3 times as long as every other following render. In order to analyze this more, I used a bunch of log statements in the Rust code to see what was causing the delay. I found that the initial delay was coming entirely from the .request_adapter and .request_device commands, which are used to initialize communication with the GPU. I'm not entirely sure how or why this process is so much quicker after the first time, but I would guess there is some behind the scenes caching and preparation of the GPU. By adding these code blocks to a function initialize_gpu, and calling it in the initialization process, the 2–3x performance improvement is available on the first render.

#[wasm_bindgen]
pub async fn initialize_gpu(){
    let instance = wgpu::Instance::new(wgpu::InstanceDescriptor {
        backends: wgpu::Backends::all(),
        ..Default::default()
    });

    let adapter = instance
        .request_adapter(&wgpu::RequestAdapterOptions {
            power_preference: wgpu::PowerPreference::default(),
            compatible_surface: None,
            force_fallback_adapter: false,
        }).await.unwrap();

    let (device, queue) = adapter
        .request_device(
            &wgpu::DeviceDescriptor {
                label: None,
                required_features: wgpu::Features::empty(),
                required_limits: wgpu::Limits::downlevel_webgl2_defaults()
               
            },
            None, 
        ).await.unwrap();
}

Performance Caveats

Although the first render performance is significantly better, most of the time discrepancy comes from the initialization of the three.js component. For future renders on the same component, the wasm thumbnailer goes back to the 10–20% speed improvement. Another note with my testing is that I am starting both the wasm and stl-viewer at the same. When they are run independently, the performance more or less equalizes. I'm not sure why this happens, but I suppose this means the wasm has a higher priority on the CPU compared to the JavaScript.

Summary

Overall, this project was great practice with Rust, WebGPU, and Web Assembly. I learned alot about GPU coding, low level systems, and how to optimize these systems. I’m glad to see that for the first render Web Assembly is 2–3x faster than the JavaScript version, but it’s important to note that for future renders it will be around 10% faster.