Benchmarking WebAssembly Runtimes

📊 Analyzing the results of Wasmer’s new speed center

WebAssembly is a new binary format that provides a secure sandbox while being designed to run at native speeds.

With new WebAssembly runtimes getting a lot of attention, a question that frequently comes up is: how fast we can run WebAssembly?

Wasmer has recently added a single-pass compiler (Dynasm) and LLVM based compiler backends in addition to the existing Cranelift backend. We became eager to measure the performance of these.

So, we started to benchmark with a number of goals in mind:

  1. Compare performance between wasm runtimes
  2. Compare wasm vs native performance
  3. Measure compilation times
  4. Compare Wasmer backends (LLVM, Cranelift, dynasm)
  5. Track Wasmer runtime performance over time

Benchmark Code

As one of the goals was to compare wasm code with native code, Rust was chosen to write benchmark code due to the ease of compiling to both native and wasm binaries.

We ran our benchmarks in Rust using a nice benchmarking library called Criterion. Since a number of wasm runtimes are implemented in Rust and others offer C APIs, Rust was a good fit for us to run wasm benchmarks.

When benchmarking, we wanted to measure two main things:

  • Runtime performance: the time that takes to execute a function
  • Compile time performance: the time that takes to compile a function from WebAssembly bytecode into machine code
You can find the source for these benchmarks here. If you have any feedback or issues, please create an issue in our benchmark repository.

Runtime Performance

We performed a small set of benchmarks designed to test runtime performance: fannkuch, fibonacci, n-body, and sha1. N-Body and Fannkuch were chosen from the Benchmarks Game as these examples compiled nicely to both native and wasm targets. The following graphic illustrates the amount of time taken per iteration of the benchmark (less is better).

Runtime execution ratio, less is better - Code speed link

To summarize, the Wasmer Dynasm backend performed at 5–10x native speeds (though we are working on improving even more the times here), Wasmer Cranelift at 2.5–4.6x, Wasmer LLVM 1.2–2.1x, and V8 via wasm-c-api at 1.5–2.1x native speed.

The wasmi runtime is interpreted and we measured it to perform between 150–190x native speeds. It was therefore omitted from the charts as the large difference made the graphics difficult to view the smaller runtime values.

Compile Time Performance

Our compilation time measurements are split into small and large compile benchmarks with 47Kb and 514Kb wasm files. We measure the time it takes for the runtime to compile the wasm bytes into a WebAssembly Module.

The following chart shows the large differences in compile times between the three Wasmer backends.

Compilation time ratio, less is better - Code Speed link

The single-pass Wasmer Dynasm compiler is the fastest, since it emits machine code while parsing the wasm file.
Next, the Wasmer Cranelift compiler, which is relatively faster, but is about 22x slower than Dynasm.
And finally, the Wasmer LLVM compiler is about 154x slower than Dynasm but produces ultra-optimized machine code.

It’s easy to conclude that the additional optimizations that led to faster runtimes are done at the cost of slower compile times.

These charts were generated by the Wasmer Speed Center (speed.wasmer.io) which is our benchmark dashboard to compare performance and track it over time.

Conclusion

Some observations from the results above:

  • WebAssembly performs at near native speeds when LLVM or wasm-c-api/V8 are used at the cost of longer compilation times.
  • There exists a tradeoff in wasm compilation time vs runtime performance due to longer compilation times allowing for more optimizations.
  • Dynasm has the fastest compilation and lowest peak performance. Cranelift has a good balance between fast compilation and good peak performance. LLVM has slowest compilation and highest peak performance.
  • Interpreted code is an order of magnitude slower than AOT (Ahead of Time) compiled code.

Due to the tradeoff between compiling quickly and producing optimized code, using tiering, a wasm runtime can have both fast startup and good peak performance. Tiering allows a wasm runtime to switch from fast compiled code to more optimized code when it is ready. Tiering is ideal when you want both fast startup and good peak performance.

Good news… tiering in Wasmer is being implemented now! 🎉

Another option which minimizes the effect of longer compilation time for optimized code is caching the compiled code.
Caching of compiled WebAssembly code has been implemented in Wasmer which allows heavily optimized code to be JIT compiled once, serialized and reused.
You can use caching to only spend time compiling one time and enjoy fast startup from the serialized modules. This is ideal when your wasm code changes infrequently.

See this blog post for more details.

Welcome to our Spectrum Community Chat if you have any additional thoughts about the benchmark!