Even better Source Maps with C++, WebAssembly and Cheerp

A few months ago I saw this blog post by Mozilla, which is about rewriting part of their source-map library in rust code compiled to WebAssembly.

I highly recommend reading it, as it shows in details a practical example of using WebAssembly to improve the performance of a JavaScript library.

I also noticed that the current limitations of WebAssembly lead to what I think are some sub-optimal design choices, and I thought that it could be an interesting experiment to write an implementation in C++ using the Cheerp compiler.

Cheerp features summary

In order to explain why I think Cheerp is particularly fit to solve some of the problems that I saw in the Rust version, I will briefly summarize what makes Cheerp unique as a compiler for the web.

Cheerp 2.0 (currently in RC2) supports two different memory models, that can coexist in a single codebase:

  • The first one is the traditional memory model of Cheerp 1.x, that maps C++ objects to garbage collected JavaScript objects, and compile C++ functions to regular JavaScript functions. The main strength of this model is the seamless interoperability with handwritten JavaScript code. A function or a class that follows this memory model is tagged with the attribute [[cheerp::genericjs]].
  • The second one is the linear memory model introduced with Cheerp 2.0: This is the model of traditional architectures like x86 and Arm, and the one used by Emscripten. The code in this case is compiled to Asm.js or WebAssembly. The main strength of this model is the better performance and the better support for type-unsafe operations (pointer arithmetic, arbitrary casting, unions). Functions and structs/classes that follow this memory model is tagged with the attribute [[cheerp::wasm]].

With Cheerp, functions with different tags can call each other, but some restrictions apply: for example it is not allowed for functions with the [[cheerp::wasm]] attribute to have parameters (or return values) of types with the [[cheerp::genericjs]] attribute. The reason is that genericjs objects are compiled to garbage collected JavaScript objects, and currently it is not possible to handle them from WebAssembly.

Handling WebAssembly objects from a genericjs function is allowed, since the memory of the WebAssembly module can be exported, and freely read and written from JavaScript functions.

genericjs functions can not only handle WebAssembly objects through pointers, but they can also allocate values both on the stack and on the heap.

Library Architecture

The architecture of the source-map library is based on the strengths of both memory models:

  • The core algorithms and data structures are compiled to WebAssembly, in order to gain the maximum performance benefits. They are exposed by the RawMappings class in raw_mappings.h and raw_mappings.cpp.
  • The API of the library, designed to be called from handwritten JavaScript, is compiled to JavaScript, so we can directly use JavaScript types (like Array and String) as arguments and return values. The API is provided by the Mappings class in mappings.h and mappings.cpp.

In the Rust implementation, the API is fairly low level, and a manually written JavaScript wrapper is responsible for exposing a nicer interface to the users.

In the Cheerp implementation, I tried to make the C++ API directly consumable by users, replacing the manual JavaScript. The original JavaScript API is still there for compatibility with the test suite and the benchmark, but it mostly just forwards arguments.

In the following section, when I compare the Cheerp and the Rust implementation APIs, I am considering the JavaScript wrapper as a “user” of the library, since my point is about Cheerp and Rust interoperability with JavaScript.

Improving the Interoperability

Most of the code in raw_mappings.cpp is just a direct translation from Rust to C++. What I think is more interesting is the code in mappings.cpp, because it showcases some of the interoperability capabilities of Cheerp.

In particular, I want to focus on three design choices that I believe are suboptimal in the original Rust implementation, and on how I implemented them with Cheerp.

1. Allocating memory for the mappings string

In the Rust implementation, a user must perform the following steps in order to pass the encoded mappings to the library:

  1. Call allocate_mappings(size: usize) -> *mut u8 in order to receive a pointer to a buffer allocated in the linear memory:

Since only numbers can be returned to JavaScript (and this includes pointers to the WebAssembly linear memory), the Rust code needs to manually encode the information needed to reassemble the Vec in the returned raw memory buffer.

2. Copy the JavaScript string to the buffer :

The JavaScript code needs to wrap the buffer in a typed array, and write all the chars from the string into it

3. Call the actual parse_mappings function:

Now that the buffer is filled, we can finally call the parsing function.

This function manually reconstructs the Vec from the raw pointer, and calls the real implementation function.

This whole procedure is quite convoluted and error-prone. It is hidden from the actual users of the library by some handwritten JavaScript code, but it is still a burden for the library author to write all this glue code. In the Cheerp version, I wanted to spare the manually written JavaScript from manually handling memory and pointers.

The signature of the entry point is:

It directly takes a JavaScript String as argument (we also take the list of sources and names, because we are directly going to return that info as strings instead of numeric indexes) and it returns an instance of a JavaScript class, and all the query operations of the library are exposed as methods of this class:

The create function is pretty simple:

It first converts the JS string to std::string. This internally does the same loop as the manual JS code above, but it is written in C++, so we can avoid manually handling memory from JS. Then it passes the std::string to the RawMappings creation function, that will parse it in a vector of RawMapping objects and sort it (The code for this is pretty similar to the Rust version).

Even though we can lift the user from the burden of manually allocating and passing around memory buffers and pointers in JavaScript, we still require to call the destroy method when the Mappings are not needed anymore, since Javascript does not support destructors or finalizers.

2- Error handling

The Rust version handles parsing errors with a global variable last_error, that is set from Rust and read from JavaScript in order to detect if the parsing succeeded or not. The JavaScript wrapper around the Rust API throws an Error with a human readable message in case of failure, based on the error code stored in last_error.

The error codes are defined both in Rust and in JavaScript, and they must be kept in sync.

Ironically, I found that in the JavaScript side the VlqOverflow error (code 5) is missing.

This is the JavaScript code that calls parse_mappings and checks for errors:

In the Cheerp version, I defined the error codes in C++, and threw the error directly from there. There is no need for the global error variable, and within the C++ code errors are passed around as return values:

The __asm__ statement works with the same syntax as the normal one, just with JavaScript instead of real Assembly.

The full code of the Mappings::create function is then:

Note that the throw_error function does not throw a real C++ exception (they are not supported in Cheerp yet), and so it does not guarantee that destructors will be called. It should be used carefully and manual resource cleanup may be needed before calling it.

3- Returning objects to JavaScript

This is the most painful drawback of the Rust implementation: since the only types that can actually be passed from and to js are integers and floating point numbers, a neat hack is used in the code in order to be able to populate a JavaScript object from a WebAssembly function: the function calls a JavaScript callback function with all the Mapping fields as arguments. The callback constructs a JavaScript object from the arguments and calls another JavaScript function provided by the user with the object as an argument.

This is the declaration of the callback on the Rust side:

There is an utility function used to call the callback with a Rust Mapping object:

This function is then called every time that a Mapping object should be returned from Rust to JavaScript, for example:

In this case, the callback is responsible for populating a JavaScript Array with the elements that are passed in the loop, but in other cases only one element needs to be returned.

The callback function that is linked as imported in the WebAssembly module is just a dispatcher of the real callback function, that is set from JavaScript just before calling the Rust function, so that the actual behavior match what is expected.

The dispatcher function keeps a stack of callbacks to avoid reentrancy issues:

Every time that some JavaScript code needs some results from Rust, it needs to push a callback to the stack, call the Rust function, and then pop the callback.

Also, the actual callback needs to be defined. Here is the matching callback of the for loop example seen above:

While this method for returning complex values is pretty clever and flexible, it involves a lot of boilerplate for every data type that needs to be returned. It also crosses the boundary between WebAssembly and JavaScript, which is expensive.

In Cheerp we have first class integration with the JavaScript world, so the solution to this problem is very simple: The functions that return values to the users are tagged with [[cheerp::genericjs]], and directly return JavaScript objects. This is the corresponding code snippet using Cheerp:

We fill a JavaScript Array in a for loop that iterates through the raw mappings objects (which resides in the linear memory), and populate a plain JavaScript object with the help of the CHEERP_OBJECT macro.

There are many ways to handle JavaScript objects from Cheerp (like using the [[cheerp::jsexported]] attribute on a class, or declaring a class in the special client namespace), but this is an easy way of creating a simple object literal initialized with a few properties.

By just returning objects, we can avoid a convoluted control flow, with no performance penalties. In fact, it should be faster to directly populate an object in this way than using the callback trick.

Benchmarks

I ran the same benchmarks as the original Rust version (look here for a detailed description of each benchmark and test source map).

All tests were performed on a MacBook Pro (Retina, 13-inch, Early 2015) with a 2.9GHz dual-core Intel Core i5 processor and 8GB of 1866MHz LPDDR3 onboard memory. The tests were performed in Chrome 67, Firefox 60.01, and Safari 11.0.3.

The Rust code was compiled from this version with Rust 1.25. The Cheerp code was compiled from this version of my repository and this version of the Cheerp compiler.

The JavaScript code containing the wrapper library and the benchmarks was built from this version of my fork of mozilla/source-map.

Performance

The Rust and the Cheerp versions have pretty similar performance, with some important exceptions in favour of Cheerp. After all, they are both low level languages, and both use llvm as a backend. However, there are a few differences in the generated code. Some of the code generated by Cheerp is JavaScript code, and not WebAssembly.

An example is the Mappings::all_generated_location method, which compiles to JavaScript so we can return a JS Array directly from C++. Most of the time is still spent in the sorting function, which is compiled to WebAssembly, so this should have no performance impact. Indeed, if we look at the set.first.breakpoint benchmark, the two are almost the same.

Another interesting difference is the implementation of the Mappings::each_mapping method (benchmarks parse.and.iterate and iterate.already.parsed). It just iterates through all the mappings (the first also parse them first), calling a user provided callback for each one.

Compiling it in WebAssembly is inefficient, as the Mozilla blog post also points out:

This benchmark was the one we worried about: it involves the most JavaScript ↔ WebAssembly boundary crossing, an FFI call for every mapping in the source map. For all other benchmarks tested, our design minimized the number of such FFI calls.

The Rust implementation still performs better than the handwritten JavaScript one (except for Safari, in which it performs the same), but here Cheerp is a clear winner: in the iterate.already.parsed benchmark with the Scala.js source map, C++ compiled with Cheerp is 1.80x faster in Firefox, 1.99x faster in Chrome and 2.05x faster in Safari, compared to Rust.

In the Cheerp implementation the method is tagged with the [[cheerp::genericjs]] attribute and compiled to JavaScript, so there is no costly FFI boundary crossing for each iteration anymore.

In the other benchmarks, the FFI overhead of the callbacks is indeed not noticeable.

The standard deviations on all the benchmarks seem to be higher for Cheerp than for Rust, but I am not exactly sure why. It is possible that having some code compiled to JavaScript put more pressure on the GC. Anyway, it is still well below the pure JavaScript version.

set.first.breakpoint benchmak
first.pause.at.exception benchmark
subsequent.setting.breakpoints benchmark
subsequent.pausing.at.exceptions benchmark
parse.and.iterate benchmark
iterate.already.parsed benchmark

Code Size

Comparing code size fairly is not as simple as it seems. The Mozilla blog post compares the size of the entire library before and after the Rust rewrite of the BasicSourceMapConsumer component. The whole library though contains also the IndexedSourceMapConsumer and SourceMapGenerator components, and some utility code that is used by both.

The result is that even considering only the manually written JavaScript code, the Rust version is bigger. A similar situation exists for the Cheerp implementation: I was able to remove most of the manual JavaScript code of the BasicSourceMapConsumer, and implement it in C++, but the external API of the module is still there, mostly to ensure compliance with the test suite.

I also moved to C++ some of the utility code, but I could not remove it from the JavaScript codebase because it is used also elsewhere.

In the end, in the Cheerp implementation the manually written JavaScript size is slightly reduced from the original, but the total size is bigger.

Compared to the Rust implementation, the total size in C++ with Cheerp is 0.69x.

If we compare only the WebAssembly size, the Cheerp version size is 0.41x the Rust one, and if we combine the generated JavaScript and WebAssembly for Cheerp, it is still 0.60x.

In order to achieve the size shown here, the Rust .wasm binary is passed through the following external programs (the original size is above 110KB):

  • wasm-gc: removes all the code that is not reachable from any exported function
  • wasm-snip: replaces a function with an unreachable statement. Used to remove functions that the author knows will never be called
  • wasm-opt: part of the binaryen project, runs some wasm-specific optimizations

The JavaScript code is also post-processed by the Google Closure Compiler.

Cheerp does not need any external tool (and using Closure for the JavaScript part results in performance penalties, for a very modest size reduction), since the dead code elimination, the JavaScript minification, and some WebAssembly size optimizations are all performed directly by the compiler.

Conclusions and future improvements

C++ and Rust are both viable options to improve the performance and maintainability of JavaScript libraries.

I think that right now Cheerp offers better interoperability with JavaScript and smaller code size.

There are plans to improve Rust in these regards: the wasm-bindgen project seems promising, and hopefully some of the tooling necessary for the post-processing of the wasm file will be integrated in the compiler. It would also be nice to have a way to automatically generate the boilerplate for loading the WebAssembly module (which Cheerp has).

Cheerp will also keep improving: for example right now there is no native support for 64 bit integers. This means for example that memcpy only copies 32 bits at a time, and this was a bottleneck in one of the benchmarks.

There are currently some limitations in how code and data with different memory models can interact, and some of them can probably be removed, resulting in better integration of JavaScript and WebAssembly code.