WebAssembly Is Fast: A Real-World Benchmark of WebAssembly vs. ES6

Disclaimer: All opinions thoughts / work were made personally by me and do not represent any of my employer’s thoughts or work.

Introduction

When developing for the web, there have been plenty of times where I couldn’t bring my idea to fruition due to browser performance. Browsers do not run instructions directly like a compiled executable written in C. Browsers have to download, parse, interpret, and Just-In-Time (JIT) compile JavaScript (JS / ES6). I’ve built more than a handful of Cordova / Ionic, Electron, and Progressive Web Apps (PWA) to allow myself to have the portability and flexibility of the web, but I knew it was always at the sacrifice of performance; So the moment I heard the whispers of WebAssembly (Wasm), I knew I had to jump in on it.

About a year ago, I started a new personal project called WasmBoy. WasmBoy is a Gameboy / Gameboy Color Emulator, written for WebAssembly, to help me learn WebAssembly. Gameboy emulation has been playable in browsers on mediocre desktop devices for a while now, but hardly playable in browsers on mobile devices. Therefore, one of the goals I had with WasmBoy was to bring playable Gameboy Emulation to budget mobile phones and Chromebooks. More importantly, with Wasmboy, I wanted to answer my question: “Will WebAssembly allow web developers to write almost as fast as native code for web browsers, and work alongside the ES6 code that we write today”. Which is a question that I think a growing number of other JavaScript developers have.

WasmBoy, at a high level, is organized in two sections. The “lib” (JavaScript API Interface) and the “core” (GameBoy Emulation “Backend”). The core of WasmBoy is written in AssemblyScript, which is a language that compiles TypeScript to WebAssembly using Binaryen. AssemblyScript is amazing. AssemblyScript allows Web developers to write more performant code in a new technology, using tools they are already comfortable with. WasmBoy is compiled to WebAssembly using the AssemblyScript compiler. However, if we take a step back, we can realize that we can mock out some of AssemblyScript’s global functions that we call within our TypeScript code base. Therefore, we can use the TypeScript compiler on the same code base that we use the AssemblyScript compiler with. Which gives us two different outputs in two different languages, using mostly the exact same source code! Using this process, I was able to make multiple cores: An Assembly script core, a JavaScript Core, and a JavaScript (Closure Compiled) core. These cores are compared in a WasmBoy Benchmarking tool that we will will get into greater detail later.

Other WebAssembly Benchmarks

There are a handful of other benchmarks out there that test WebAssembly vs. JavaScript performance. Commonly, you will find stack overflow questions that do a micro benchmark with wild results. This is due to the fact that WebAssembly offers over JavaScript isn’t a peak performance boost, but a consistent / predictable performance that can’t “fall off of the fast path,” like JIT compiling JavaScript can.

Another common benchmark found, is a comparison of the two different compiler outputs of Emscripten. Emscripten takes LLVM bytecode from C/C++ and compiles it down to asm.js or WebAssembly. Where asm.js is kind of a precursor to WebAssembly, it is a highly optimized subset of JavaScript intended to optimize JavaScript performance and not be written by day-to-day developers.

Colin Eberhardt, who runs WebAssemblyWeekly on Twitter, has a great response / TL;DR to one of the micro-benchmark stack overflow questions on the problems with micro benchmarking, and how Wasm should give about a 30% increase over asm.js in a real world case. Here is a link to the paper they are referring to for the Wasm performance increase claimed in the Stack Overflow response. Also, Colin has an A M A Z I N G talk on WebAssembly. The talk has a section that does a ton of comparisons of Wasm vs. Native vs. JS performance, and the talk illustrates this in much more detail than that response linked above.

In terms of other “Real world” WebAssembly Benchmarks, PSPDFKit has a great benchmarking tool and article on WebAssembly performance in a production application. I highly suggest giving that article a read as well if you are interested in this topic as it provides another point of view, and they did a great job comparing the two. However, the PSPDFKit benchmark does the comparison between WebAssembly and asm.js, and not WebAssembly and ES5/ES6. Therefore, the PSPDFKit benchmark is great if you are a developer with a large C/C++ application, and were wanting to know if moving from asm.js to WebAssembly is a great idea (which it is). Although, the PSPDFKit benchmark doesn’t really answer the question for JavaScript / Node developers on how WebAssembly will perform as a replacement of a computationally demanding piece of JavaScript code in their web application. Especially if these JavaScript / Node developers are learning a new language or platform to answer this question.

Gameboy Emulators make great benchmarks, and even the Chrome team used a Gameboy Emulator to benchmark browsers at some point. Game emulation in general stresses almost every part of a language / platform. Since it requires graphics, sound, controller input, and presents several interesting challenges such as performance, and flexibility. Emulation tends to be very computationally intensive, which makes it a great fit for WebAssembly. Also, WasmBoy is in the unique position to compare transpiled ES5 code from a popular compiler (TypeScript) to WebAssembly. Therefore, I thought WasmBoy would be a great fit for this type of benchmark. We mentioned before that asm.js is a faster subset of javascript, so let’s assume from this benchmark we should be notice a performance increase around 30% (1.3 times as fast).

WasmBoy Benchmarking Explained

As mentioned earlier, this benchmark will be utilizing the WasmBoy benchmarking tool (source code). The benchmark features three different cores as of today. AssemblyScript (WebAssembly built with the AssemblyScript compiler), JavaScript (ESNext output by the TypeScript compiler), and the previous JavaScript core except run through Google’s Closure Compiler that was built to optimize JavaScript to run faster. Each core is then imported by the benchmarking application using standard ES6 imports, and built into an IIFE using rollup.js.

The WasmBoy benchmarking tool works by loading each of the available WasmBoy core configurations, and then runs a specified number of frames of an input ROM / Game. The time it took to run each frame of the ROM is recorded in microseconds, using the npm package microseconds. This does not use the popular benchmark.js, since benchmark.js focuses more on running the same exact code multiple times. When benchmarking frame by frame, one frame we could be doing a ton of sound processing, and the next frame could just be moving around memory. Once we have all the times that it took to run each individual frame, we can process the data into other statistical values, and visualize on charts.

WasmBoy Benchmarking Setup

The benchmarking tool has some open source ROMs that can be run directly from the tool, or any GameBoy / GameBoy Color ROM can be uploaded to be tested. As mentioned before, every frame of a ROM is different, and so is every ROM! In our tests, we run the first 2500 frames of each ROM. However, we drop the first 10 percent of frames as It can greatly skew our data since In this benchmark JavaScript has a bit of time before “hot” code starts getting JIT compiled and then starts to level out at stable speed. For this test, we are running Tobu Tobu Girl, and Back To Color. Tobu Tobu Girl is a standard GameBoy game, and thus does a normal game intro. Tobu Tobu Girl does its title screen graphics, and sound effects here and there for about the first 1000 frames. Then, it switches into fully animated title screen with a full featured song. Back To Color is a GameBoy Color Demo, which are usually built to do cool effects, and push the limits of the system. Back To Color starts with a rapidly changing bass line, and color text that scrolls in. In about the last 1000 frames it shows an awesome cityscape with a continually complicated song. These are important to keep in mind, as sound is the most demanding part of WasmBoy, followed by graphics (where color is more complicated), and running standard CPU opcodes is the least demanding. Because of this, Back To Color should be slower than running Tobu Tobu Girl. And you will notice other ROMs can give greatly different results.

I then ran it on a variety of devices, and took screenshots (and merged them together into one large full page screenshot). The devices I tested on were:

I tested the benchmark on all major browsers, on the major browsers each device supported. The browsers we Chrome 70, Firefox 63.0.2, and Safari 12.1. I didn’t test on Edge because Microsoft recently announced Edge will be replaced with a Chromium based browser. Feel free to use the link to the tool mentioned at the beginning of this section to test on your own devices and their respective browsers.

Results

To keep the article shorter, we will only highlight and embed some of the results in this article. However, the images and results for all other configurations can be found in the WasmBoy repo. To interpret our results we will be using the “Sum” row in the tables to represent the performance of each core. The “Sum” represents that total time it took to run each frame added together. Also, we will be interpreting our results using a clear “X times as fast” format, as explained by this article on explaining performance improvements.

Desktop

For desktop, let’s take a look at the results of Back To Color of the 2015 MBP on Chrome, FireFox, and Safari. This is because, Back To Color is the more demanding of the two ROMs tested, the 2015 MBP is what I use to develop the emulator, and has support for all three major browsers.

Back To Color Results on MBP 2015

MPB 2015, Chrome

  • Wasm vs. JavaScript: Wasm is ~1.67 times as fast.
  • Wasm vs. JavaScript Closure compiled: Wasm is ~1.45 times as fast.
  • JavaScript Closure compiled vs. JavaScript: JavaScript Closure compiled is ~1.15 times as fast.

MPB 2015, Firefox / Edit: (Bug / Issue)

  • Wasm vs. JavaScript: Wasm is ~11.71 times as fast.
  • Wasm vs. JavaScript Closure compiled: Wasm is ~6.00 times as fast.
  • JavaScript Closure compiled vs. JavaScript: JavaScript Closure compiled is ~1.95 times as fast.

MPB 2015, Safari

  • Wasm vs. JavaScript: Wasm is ~1.35 times as fast.
  • Wasm vs. JavaScript Closure compiled: Wasm is ~1.38 times as fast.
  • JavaScript Closure compiled vs. JavaScript: JavaScript is ~1.02 times as fast.

As you can see here WebAssembly is the fastest on every browser, and JavaScript (TypeScript, without Closure Compiler) is the slowest in most cases.

Mobile

For mobile let’s take a look at the Chrome and Firefox on the Moto G5 Plus and Safari iPhone 6s results for the Back To Color ROM. As stated in the introduction, I wanted to get this running on more “budget friendly” devices, as these devices will have a harder time keeping up with the emulation.

Back To Color Results on Moto G5 Plus and iPhone6s

Moto G5 Plus, Chrome

  • Wasm vs. JavaScript: Wasm is ~2.59 times as fast.
  • Wasm vs. JavaScript Closure compiled: Wasm is ~2.07 times as fast.
  • JavaScript Closure compiled vs. JavaScript: JavaScript Closure compiled is ~1.25 times as fast.

Moto G5 Plus, Firefox / Edit: (Bug / Issue)

  • Wasm vs. JavaScript: Wasm is ~16.11 times as fast.
  • Wasm vs. JavaScript Closure compiled: Wasm is ~8.72 times as fast.
  • JavaScript Closure compiled vs. JavaScript: JavaScript Closure compiled is ~1.84 times as fast.

iPhone 6s, Safari

  • Wasm vs. JavaScript: Wasm is ~1.23 times as fast.
  • Wasm vs. JavaScript Closure compiled: Wasm is ~1.15 times as fast.
  • JavaScript Closure compiled vs. JavaScript: JavaScript Closure compiled is ~1.07 times as fast.

Again, as you can see WebAssembly is the fastest on every browser, and JavaScript (TypeScript, without Closure Compiler) is the slowest.

Result Analysis

From the results we can tell you get wildly different performance boosts depending on the browser, device, and core we are using. To start, let’s refer back to our original expectations, that “Wasm is about 30% faster than asm.js / JavaScript”. On desktop chrome, this is mostly true! I want to say “mostly”, as we noticed it was about 40% faster not using closure compiler. Personally, I have not seen many web apps / libraries use Closure compiler day-to-day, but running it through closure gets us to the expected 30%. Though on all other configurations, this can’t really be considered true. Mobile Chrome is about 60% faster, mobile and desktop Firefox is insanely faster by about 90%, mobile and desktop Safari is only bit faster by about 20%.

Next thing we should take a look at is that Closure compiler is an easy win over an existing JavaScript application if Closure compiler doesn’t throw any errors on your application when you try it out. Particularly on mobile and desktop firefox, you can get a huge ~40% performance boost, and the ~10% performance boost on mobile and desktop Chrome would also make it worth fighting for. If this is something that interests you before making the full on leap to Wasm, take a look at I’d highly recommend my colleague’s Closure compiler rollup plugin. This is what I used to generate the Closure compiled Wasmboy Core.

Personally, my biggest question with WebAssembly was its mobile performance. And taking a deeper look into this, you can tell WebAssembly is D E F I N I T E L Y worth investing time into. A ~60% increase of mobile web performance on Android opens up a whole new realm of possibilities in terms of what we can run on mobile browsers for PWAs, hybrid applications built with Cordova, or frameworks built on top of Cordova like Ionic. I was very pleasantly surprised to see these results and definitely made my day!

WasmBoy Benchmarking Gotchas

I wanted to make sure we highlighted some Gotchas before closing out the article. By “Gotchas”, I mean some things about the benchmarking tool or WasmBoy that could possibly give skewed results.

For example, one big red flag is how slow the JavaScript core runs in Firefox compared to other browsers. This could probably be because, even though I compiled my AssemblyScript code through the TypeScript compiler, I didn’t write my code like a typical ES6 application. (EDIT: In fact, this was the case! See the bug / issue). For instance, even though my code is all completely valid ES6, I’m not using Arrays or instance objects, or built-in helper functions like Array.foreach. Another gotcha is, WebAssembly only has a single linear memory. Even though Game Boy Memory is divided into separate sections you would want to store as different Uint8Arrays, I have to use one giant Uint8Array and export constants that represent the index of the start of each section. Perhaps Firefox is more optimized to your everyday ES5 / ES6 code that gets transpiled down into common websites, and isn’t too friendly with my mocked out Wasm Interface and odd memory management, to get the AssemblyScript code running in JS.

Another Gotcha that may exist here that would actually improve our Wasm results, is the overhead that lies in jumping between Wasm and JS. Even though the over head in jumping is small, and continually being improved, it is still there and counted by this tool. This is important to bring up, because maybe you are building an application that does not need to jump back and forth 60 times per second to update the screen. Maybe you can have a long running task that doesn’t require as many calls per second, and this will actually improve your Wasm performance relatively.

There is also a “Gotcha” in the ROMs that I chose to use. If you play around with the tool, and try the cpu_instrs ROM, you may notice JavaScript is way faster! As mentioned before, what WebAssembly offers over JavaScript isn’t a peak performance boost, but a consistent / predictable performance that can’t “fall off of the fast path”, like JIT compiling JavaScript can. For example, the cpu_instrs ROM is mostly spending time in the CPU, just testing instructions and moving around memory. There is very little graphical work, and no sound work. Because of this, JS Engines won’t need to jump around as much, and continually optimize just the CPU code, and then can appear to give better results over Wasm. Also, the CPU is the least demanding part of WasmBoy, thus it should be easier to optimize from that as well. Let’s not forget that JavaScript has had plenty of years to get its optimizations right and running as fast as possible. Even though Wasm is in production, it still is relatively very young compared to JS, and will get faster over time.

Lastly, I would like to highlight the Gotcha that different languages which compile to Wasm will give different performance results. For example, here is another great introduction to WebAssembly and performance deep dive article that compares performance between Rust (one of the most popular language for Wasm currently), AssemblyScript, and JavaScript. In most of the performance tests, Rust outspeeds AssemblyScript by a fair bit. There is also another awesome Gameboy / Gameboy Color written in Go by Daniel Harper called gomeboycolor, that has a very interesting article on porting their emulator using the new experimental Go Wasm output. In the article they go over the emulator performance, and they notice Chrome to be much slower than Firefox and Safari. However, in these results of mine, you can find similar results, but Firefox isn’t as fast as Safari as it is in the gomeboycolor port. Ben Smith, a buddy of mine who is on the Chrome Wasm team and also built their super fast and accurate GameBoy emulator called binjgb, gave a quick answer why this may be in a Twitter conversation between us. AssemblyScript and Wasm output for Go are both still young projects. AssemblyScript has made G R E A T strides from when I first started using it and is already being used a bit in the WebAssembly Crypto scene. But as of today, Wasm Engines are better suited for some language’s Wasm output, compared to another language, until they catch up with one another.

Conclusion

Thanks for reading this ridiculously long deep dive into Wasm performance! I hope you got as much out of this as I did. Let’s do a quick recap of the original questions we had in the introduction:

  • “Was I able to bring playable Gameboy emulation to budget mobile devices and Chromebooks using WebAssembly?”

Yes! Looking at the results of the Moto G5 Plus, 44 FPS average doesn’t seem like it would be playable, and you are right, it isn’t. But what this benchmark doesn’t show is that I put a bit of elbow grease into some configurable options to increase performance, sometimes at the expense of accuracy. For example, I have an option for “Batch Processing” (or Lazy Evaluation), where we won’t update anything about the sound until we actually need to do something with it, allowing us to just skip it most of the time for silent sections, or just sustained notes. Also, the npm install-able WasmBoy Lib which has a cleaner / easier to use API than the core was eventually rewritten to modularize its intensive code into Web Workers. This greatly improves performance, along with other options, and gets even demanding ROMs like Back To Color running at 60 FPS on the Moto G5 Plus! Which then let me build a Vaporwave inspired PWA GB / GBC Emulator built with Preact and Preact-CLI, called VaporBoy (Source Code).

  • “Will WebAssembly allow web developers to write almost fast as native code for web browsers, and work alongside the ES6 code that we write today?”

This performance benchmark deep dive only answers part of this question. Yes, WebAssembly definitely allows us to write faster code that can run alongside ES6 that we write today, which was an intended goal of WebAssembly. Even though WebAssembly is much faster than JavaScript it is meant to play nice with it, and not replace it entirely. Please keep in mind, emulation is something WebAssembly is meant to be good at, highly computational tasks that involve playing with numbers and memory. However the first part of the question, “Will WebAssembly allow web developers to write almost fast as native code for web browsers…”, was answered by other benchmarks mentioned by Colin Eberhardt’s talk mentioned in the intro, and the paper analyzing WebAssembly. But as a TL;DR to this question, Wasm is usually slower by about 10% than native C code, which to some developers is a fair trade off for the portability and flexibility of the web.

  • “Is Wasm is about 30% faster than asm.js / JavaScript?”

Honestly, I will say that this depends on a lot of factors. We can tell from these results that it depends on your device, browser, language, and use case for WebAssembly to answer this question. If you plan to use WebAssembly on desktop Chrome, written in AssemblyScript, for Wasm’s Intended use case (Computational heavy-lifting tasks), then yes, Wasm is about 30% faster. But on mobile it can be much faster at around 60%, and on firefox it can be much much faster around 90%. The only time we really observed it being slower than 30% was on safari, but more so because it’s JS engine handled WasmBoy very well, not because it’s Wasm engine wasn’t that much faster.

In conclusion, playing around with Wasm has been a very fun and rewarding experience for me. I definitely am grateful to see such a huge leap in performance, and doors that have been opened for JavaScript / Node developers like myself. WasmBoy has been a very fun, and engaging side project of mine, that I am still excited to work on at 3 am almost every night. I am extremely excited where WebAssembly is today, and where it will be going in the future. Thanks for reading!

Edit (12/21/18): Added Firefox performance bug / issue