Helping WebAssembly reach close to native speeds will be a great step forward to wide adoption.
The current WebAssembly SIMD proposal is closing the performance gap, by allowing numeric-intensive WebAssembly programs to leverage SIMD to improve their runtime performance.
Over the past couple of months at Wasmer, we’ve been hard at work adopting SIMD into our server-side WebAssembly runtime, and found some great results from our speed analysis of native vs WASM-SIMD vs WASM-without-SIMD in our new SIMD implementation.
Wasmer is the first Wasm runtime to fully support WASI and SIMD! 🎉
What is SIMD?
SIMD stands for Single Instruction, Multiple Data.
With just one instruction we can perform an arithmetic operation in multiple data lanes.
Let’s say we want to multiply four numbers (
i32) by two and get the results back:
1 × 2 = 2
2 × 2 = 4
3 × 2 = 6
4 × 2 = 8
Normally, to get the results (2, 4, 6, 8) we will have to do four multiplying operations (one for each number).
By using SIMD we can do the same multiplication with just one CPU operation.
(1, 2, 3, 4) × (2, 2, 2, 2) = (2, 4, 6, 8)
This will be much faster to compute ⚡️.
In this example we are using 128-bit SIMD registers with 4 data-lanes (32-bit each), however we can vary the number of data lanes (and the total bits of the register) depending on what our needs are.
Where SIMD can be useful?
SIMD can be specially useful for programs that are very intensive on numeric operations (addition, subtraction, multiplication, …) over a large set of numbers.
Examples of this are:
- Image/Audio/Video Processing
- Physics engines
By leveraging SIMD, a WebAssembly program could have speedups up to 16× on operations in 8-bit numbers (255), or up to 2× if we are operating on 64-bit numbers (since WebAssembly SIMD operates on 128 bits)
Note: outside of WebAssembly instructions, SIMD can operate on registers wider than 128 bits. Thus the speedups can be even higher in certain native implementations.
We decided to start the SIMD implementation in the LLVM backend to take advantage of LLVM’s implementation of SIMD instructions.
Adding SIMD support into Wasmer touched other external open-source projects such as:
- WABT: The WebAssembly binary toolkit
- WABT-rs: the Rust bindings to WABT
- Wasmparser: a fast Rust WebAssembly parser
In the process of working on the SIMD feature for Wasmer WebAssembly runtime, we created (and successfully merged) over 10 different PRs into these projects.
We also added an intensive set of SIMD spectests to assure we comply with the original specification
We would like to use this article to personally thank all the maintainers of these repos for their incredible support and quick response time: Yuri Delendik (
wasmparser project), Ben Smith and Thomas Lively (
wabt), and Sergey Pepyakin (
SIMD Speed Analysis
Now that SIMD support has landed in Wasmer (LLVM backend), we can analyze the speedup that we can achieve with it.
We created a SIMD example that emulates particle physics using C++, WASI and of course... WebAssembly!
Here are the timings of running our physics simulation:
As you can see, the speed when running the SIMD in the native executable versus running it with Wasmer… is almost the same!
C++ WASM WASI SIMD128 Example
This repo is made to showcase how to emit Wasm SIMD 128 instructions from C++, and use it with Wasmer.
How can you use it?
The latest release of Wasmer (
0.6.0) has shipped with SIMD support.
You can install Wasmer with:
curl https://get.wasmer.io -sSfL | sh
Note: you can also use Wasmer with SIMD on Windows — Download the Wasmer installer
Running WebAssembly-SIMD programs in Wasmer is as easy as choosing the LLVM backend and passing the
--enable-simd flag when executing a
.wasm program with
wasmer run --backend=llvm --enable-simd particle-repel-simd.wasm
We will have SIMD enabled in the other backends soon. Stay tuned! 🙂