Pragmatic compiling of C++ to WebAssembly. A Guide.

Thomas Deniffel
Jan 17 · 15 min read

Most C++-Programmers I know of have already heard of WebAssembly, but most have had troubles getting started. This guide brings you beyond of a simple “Hello World”: To a stateful-applications with interactions between C++ and JavaScript.

I found not a single article out there, that handled more than just the bare minimum. It took much effort to come from the simplest “Hello World” to a system, that can solve actual real-world problems. This post delivers that.

Yes, there is a GitHub-Repo with the final code, but it makes much more sense to follow this tutorial. https://github.com/tom-010/webassembly-example

Note: This article does not show, how to pass arrays from JS to WebAssembly, but this article does.

This article is no intro to WebAssembly itself or why you should use it, so there is no big speech of motivation in the beginning. Nevertheless here the definition from https://webassembly.org/:

WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable target for compilation of high-level languages like C/C++/Rust, enabling deployment on the web for client and server applications.

The homepages list four significant reasons to use it: Efficient and fast, Safe Open and debuggable, and Part of the open web platform

However, be honest. The only reason is number one: “Efficient and fast.” Anything else bring is achieved with JavaScript as well.

So, let’s get started!

The Operating System

I am using Ubuntu 18.10; just with the standard build tools for C++:

All tooling supports Windows and MacOS as well.

Compile the Toolchain

First of all, we need the toolchain (based on Clang). The best place to start is the Getting Started Guide (here are the steps for the other operating systems as well).

This takes a while (between others Clang gets compiled) and requires some disk space.

Feel free to use the time to play around with WebAssembly in the “WebAssembly Explorer” (like Compiler Explorer).

First Compilation: “Hello World”

It is time to compile “Hello World”:

Before compiling, we have to source our compiled toolchain. Run the following command in the directory which you cloned before:

It is even better to append the following line to you ‘.bashrc’ via (run in the also in the cloned folder!):

The result is a hello.html and more important the hello.wasmfile. The latter contains the compiled code.

Run The Code

If you open the index.html in the browser, you get CORS-Problems. You have to serve them via a web server. EmScripten brings this with it:

This starts a web server, opens a browser and navigates in the current directory. Just click the newly created hello.html. And voilà:

emscripten result

Emscripten provides a console automatically and executes your program.

Call By Yourself (JavaScript)

The console in the web browser is nice but not very useful in production. Let’s write a minimal script to call our “Hello World.”

Emscriptes has created a hello.html and a hello.js as well. The HTML-File is entirely bloated and contains nothing particularly useful for further usage.

The ‘hello.js’ is very helpful. It loads and instantiates our WebAssembly code and provides a JavaScript interface to it. Therefore we keep it and replace the HTML-File by:

Refresh the browser and open the web-console (Shift+Strg+I and then the ‘console’ tab). And here we have:

“Hello World” in the web-console

I can’t get the code more minimal as this HTML, so I am at a good point to some more complex stuff. Most tutorials end here, but no project has a single file!

Two Or More Files

Just throw-away-code (Fibonacci numbers):

Compiled by:

My web-server is still running, so after a refresh:

The good message: It works. The bad: It has overwritten my hello.html. The trick is to specify ‘hello.js’ as output instead of ‘hello.html’.

This regenerates just the ‘hello.wasm’ and the ‘hello.js’ but not the ‘hello.html’. To bring a little automation, a build-script together with an appropriate folder-structure:

In the newly created directory:

With a little run-script for typing-convenience:

And an adjusted index.html:

Nice. Now we can develop the web app independently and have the generated sources in an extra folder so that it becomes easy to remember, that you shouldn’t modify them (as it is with any generated source).

To claim that the previous example has multiple files is cheating (no headers, etc.). So we split the fib into header and implementation:

Note that hello.cpp does not include fib.cpp, but only the header. Therefore a linking process has to happen. This is the reason, why the build fails:

Adding the fib.cpp to the build-script fixes the problem:

Note: ‘|| exit 1’ causes the script to stop if the build fails!

The build passes now. Please note, that I changed the parameter for fib to 6:

So we can see an actual difference now:

Compiling multiple files works! As long you are able to extend the simple build-script, this approach is fine. We go into Build-Systems (CMake) later for more complex projects. But let’s go into argument-passing first!

Disassembling

Sometimes (as in the next section) it is useful to disassemble your code into S-expression. You can do this with

Here you can find functions globals and so on easily. S-expressions are the textual representation of the WebAssembly. To understand the building-blocks, check out the great guide by MDN.

Disassembling becomes particularly useful when the C++-code was compiled with the flag ‘eemc … -s ONLY_MY_CODE=1 …’. Then, the result is only a few lines long and you can analyze it carefully.

As C. Gerard Gallant suggested in the comments, emscripten can also generate the wast-file directly while compiling the wasm with the flag ‘-g’.

You can now find the always-up-to-date ‘wast’-file in the build folder (build/hello.wast).

Function Calls & Passing Arguments

That the console shows “fib(6) = 8” comes from the cout in the main of the C++-program, that is executed after loading. Now, I want to call fib from JavaScript:

Now, I am facing two problems:

  1. The Program is not loaded, when I want to execute fib(10)
  2. The function fib is not exported from C++ by eemc and therefore not available to JS

Note: To get all exported functions, you can decompile the wasm-file via ‘wasm-dis hello.wasm -o hello.wast’ and search in the wast-file for “ (export”. Because of C++ the function names are prefixed. The file is written in S-expression.

Without modification, only ‘main’ is exported. We have to change the build-script:

With ‘-s EXPORT_ALL=1’ the fib-function gets exported as well but with the modified name (by C++) ‘__Z3fibi’. I found the name via looking the decompiled code (no fear — it is easy).

As you may saw, is multiple KB big (9.6 in my case). Just for a very simple algorithm. This comes from the ‘iostream’ module. When you remove the import and the call and set the flag in the build.sh, that only our code should appear in the resulting wasm-file the file gets much handier:

Now the file is only 96 Bytes big. This is more appropriate. With this, we can call our C++-Function:

Note, that we don’t include ‘gen/hello.js’ anymore. Here we do the minimal work to load the wasm ourself. In the first block, we load, compile and instantiate the program, in the second we run it. Not that complicated. Later in the article, I will handle the memory ideas but for now, this works.

Nice. The first C++ code called by JavaScript

Imagine, our code gets bigger. I simulate this with including ‘iostream’ again and remove the corresponding flag from the build-script:

We are back at 165,4 kB file-size of the ‘hello.wasm’. This is realistic enough for simulating some lines of code. We didn’t change anything about the binding or the algorithm, so it should work:

An error occurs:

This makes sense. We do not want to block our main-thread with loading, compiling and loading a WebAssembly file.

Google provides good docs, how to handle WebAssembly efficiently.

In fact, from now on, it gets very nasty, because, we would define stuff for every exported function. Otherwise, we get many many wired errors. Defining all the bindings for some functions would be okay, but remember, we exported all functions which include all of ‘iostream’. I tried it some hours and realized that the generated code by EmScripten is the easiest way for now. Therefore:

I still use the prefixed name of the fib-function. I register my function ‘Module.onRuntimeInitialized’, which makes sure, that it is executed after loading, compiling and instantiation of our (big) program. It works:

Not the best solution, but it works. WebAssembly is still very brittle (at least the tooling around), so we have to live with this. A first step would be to whitelist the exported functions.

Stateful C++ Code

Calling stateless functions makes sense just in rare cases. Therefore, I want to design a simple class:

Nothing special here. Just a dump and stateful class. Let’s use it from JavaScript. I start small and do the instantiation in C++:

Note: I’ve used the call of fib in main, that the compiler does not optimize away my fib function.

The name of my ‘fib’-binding changed, therefore my JS-code looks like:

This works out of the box:

One object (state) per public function is not enough. The next easiest way is to do dispatching:

The idea is simple. I took it from functional programming. ‘new_fib’ is our constructor and an integer is its ‘address.’ Maybe not the most elegant solution, but it works and easy to understand and therefore change. We have the two names for the required function:

Calling is easy:

It is time to abstract away the ugly C++-interface:

The output is still the same, but now, we have a very nice JS-interface and encapsulated the C++-part:

Sure, the next step could be to actually call the constructor of the ‘Fib’ class in JS. However, for me, that makes sense (now). ‘new_fib’ is also a specialized constructor optimized for JS and we are also language-agnostic. Replacing our approach with C would require no conceptual change in the instantiation.

My next step would be to replace the vector with a map and provide a delete method to get rid of no longer needed objects.

A stable interface between C++ and JavaScript

As you recognized, the name of our function changed after each refactoring, which caused our integration to fail. The wired names come from C++s name mangling.

To prevent this, we export their signatures as C-code.

This is nice because I wanted to list the exported functions anyway. Now we have more consistent names (just prefixed by an underscore):

You may recognize the size of ‘script.js’ and the big number of exported functions in the decompiled ‘wast’-file and the resulting size of the ‘Module’ in the JS-context. All this comes from the ‘EXPORT_ALL’:

EmScripten exports all the functions of all included packages and generates bindings for them. With the consistent names, we can export only what we need.

We can specify with ‘EXPORTED_FUNCTION,’ what we want to specify. The generated ‘hello.js’ is now much smaller, and we don’t leak trough internals anymore (check it out).

Integration

So far I am satisfied with the integration of C++ and JavaScript. However, it has many moving parts, like the exporting of the C-functions, the build-script, the Facade, and the usage of the facade. This complexity screams for an integration test.

Remember, that an integration test should not break if there is a flaw in the logic but only if the integration of two components themselves does not work anymore.

This article is no tutorial on JS-testing-frameworks. Therefore I just write vanilla JavaScript without test runner etc. Feel free to integrate the logic in the framework of your choice!

This checks if the functions are available and if next returns an integer, which is part of the interface. With this, I can easily refactor steps in the pipeline with the confidence, that I don’t break anything — for example the build system (which is still very bad).

CMake Integration

The current “build-system” is, well let’s say, not optimal:

Therefore I created the following ‘CMakeLists.txt’ in the cpp-directory:

I can now modify my build script:

It would be possible to pull the rest into CMake as well, but I don’t see sense in this, because these are project and platform specifics and I will likely not modify this anymore.

Resources & Tutorials

Random Thoughts and Experiences

Here is a collection of random thoughts of mine regarding WebAssembly. I will extend the collection as I get new insights. Feel free to ignore it or suggest some points.

  • You cannot access the DOM in WebAssembly. Therefore a natural boundary between logic (C++) and UI (JavaScript) is enforced. It also means that you have to define your modules carefully because you cannot easily refactor from one side of the boundary to the other: The languages are different.
  • Would it be nice, if you could compile JavaScript to WebAssembly and an engine decides, which parts are compiled? Makes it even sense?
  • If you do a ‘printf()’ without a “\n”, you get a warning in the chrome dev console, that the content did not get flushed and no output. Add “\n” to fix this
  • The flag ‘eemc … -s SIDE_MODULE=1 …’ prevents the generation of the HTML- and JS-file
  • The flag ‘eemc … -s ONLY_MY_CODE=1 …’ prevents the generation of the HTML- and JS-file and also compiles only the self-written code. Not even ‘stdio’ and the stdlib-stuff is compiled. This makes the resulting wasm-file way smaller.
  • To decompile a wasm-file you can use ‘wasm-dis hello.wasm -o hello.wast’. This brings the file into the WebAssembly Text-Format, encoded in S-expressions. You find details on the structure of the file in this great Guide on MDN.

Lessons learned

Just a collection from of the things that I recognize in while applying C++ with WebAssembly real projects.

  • Whenever your browser just hang and Chrome says that the website crashed it is possible some WebAssembly related. try { … } catch (Exception e) { …} helps most of the time.

Thomas Deniffel

Written by

Programmer, CTO at Skytala GmbH, Software Craftsman, DDD, Passion for Technology