How to get a performance boost using WebAssembly
New year has just started and with it new resolutions to accomplish. How about learning how to use WebAssembly and get a performance boost?
This article continues a serie of articles that we are writing about performance, go and check “How to get a performance boost using Node.js native addons” and “A 1300% performance gain with Ruby time parsing optimization!” ✌️
We are going to cover the same 3 techniques we already covered in the previous article:
A bit of history
Building + Loading module
Let’s take a look at how we transform our C program into wasm. To do so, I decided to go with the Standalone output which instead of returning a combination of .js and WebAssembly, will return just WebAssembly code without the system libraries included.
This approach is based on Emscripten’s side module concept. A side module makes sense here, since it is a form of dynamic library, and does not link in system libraries automatically, it is a self-contained compilation output.
$ emcc fibonacci.c -Os -s WASM=1 -s SIDE_MODULE=1 -o fibonacci.wasm
Once we have the binary we just need to load it in the browser. To do so, WebAssembly js api exposes a top level object WebAssembly which contains the methods we need to compile and instantiate the module. Here is a simple method based on Alon Zakai gist which works as generic loader:
Cool thing here is that everything happens asynchronously. First we fetch the file content and convert it into an ArrayBuffer which contains the raw binary data in a fixed length. You can’t manipulate it directly and that’s why we then pass it to WebAssembly.compile which returns a WebAssembly.Module which you can finally instantiate with WebAssembly.Instance.
Take a look into the Binary-encoding format that WebAssembly uses if you want to go deeper into that topic.
Results (You can check a live demo here)
JS loop x 8,605,838 ops/sec ±1.17% (55 runs sampled)
JS recursive x 0.65 ops/sec ±1.09% (6 runs sampled)
JS memoization x 407,714 ops/sec ±0.95% (59 runs sampled)
Native loop x 11,166,298 ops/sec ±1.18% (54 runs sampled)
Native recursive x 2.20 ops/sec ±1.58% (10 runs sampled)
Native memoization x 30,886,062 ops/sec ±1.64% (56 runs sampled)Fastest: Native memoization
Slowest: JS recursive
- Best C implementation is 375% faster than best JS implementation.
- Fastest implementation in C is memoization while in JS is loop.
- Second fastest implementation in C is still faster than the JS faster one.
- Slowest C implementation is 338% times faster than the JS slowest one.
Hope you guys have enjoyed this introduction to WebAssembly and what you can do with it today. In the next article I want to cover non standalone modules, different techniques to communicate from C <->JS and Link & Dynamic Linking.
Happy 2017 🐣