This article was made a lot better with feedback from Cristiano Calcagno, Simon Fowler, and Thomas Barras.
As an answer to this WebAssembly (https://webassembly.org) was created by browser vendors, which provides a much better parsing story (easily 20x faster) — and is actually intended to be targeted by compilers.
The WebAssembly MVP, which is implemented by all major browsers, is mostly targeted towards C / C++, Rust, Emscripten applications. Support for managed languages is expected to improve though through extensions upon WebAssembly. OCaml is perfectly positioned to become one of the first languages to take advantage of this, and use the upcoming garbage collection extension (https://github.com/WebAssembly/gc, https://github.com/lars-t-hansen/moz-gc-experiments). It takes effort however to make this happen, and help is very welcome. We are active on OCaml Discord and OCaml Labs Slack in the #ocaml-wasm channels. The repository for the backend is located here: https://github.com/SanderSpies/ocaml/tree/before_gc.
Adding WebAssembly support to OCaml opens a lot of interesting opportunities. To name a few:
- better visual representation of financial data (eg. JaneStreet)
- a blockchain with better integration on the web (eg. Tezos, Coda)
- a version of an app that matches the native experience (eg. Facebook Messenger)
- running a library OS directly in the browser (eg. MirageOS)
This article will further describe the details of the current implementation of the WebAssembly backend for OCaml.
Some background info
The OCaml compiler consists of several parts. The most important part for this article is the CMM intermediate layer from which OCaml targets the native backends. Instead of targeting a specific CPU we are targeting WebAssembly. There are several advantages to taking this approach which we will discuss throughout this article. More info on the OCaml compiler can be found on the always great Real World OCaml website: https://dev.realworldocaml.org/compiler-backend.html https://dev.realworldocaml.org/runtime-memory-layout.html are good starting points.
You cannot talk about targeting WebAssembly without talking about correctness.
The sound type system of OCaml guarantees that types are not going to be changed during runtime. This allows the compiler to remove type information as it’s not needed when executing the code — it’s guaranteed to be correct. This guarantee however cannot be trusted by a WebAssembly interpreter as it doesn’t have the entire picture like the OCaml compiler has. It needs to check for correctness again.
To perform this check efficiently WebAssembly bytecode can be validated in a single pass. This is accomplished by using structured control flow instead of using jumps. The structured control flow avoids irreducible loops, misaligned stack heights, and branches into the middle of multi-byte instruction. It also requires extra type information at certain locations to be able to perform this single pass.
CMM, the layer from which we target WebAssembly and also the last layer before targeting other backends, already provides some type information, but not all that is required for WebAssembly — therefore the OCaml WebAssembly backend uses an additional intermediate representation which adds the additional type information
(System) Stack access
Access to the stack allows code to be changed during execution, as this is a security risk webassembly does not have access to the stack. This makes implementing the OCaml runtime challenging which requires stack access for implementing garbage collection, tail calls and exceptions.
One could choose to implement these missing features within the constraints of the current version of webassembly, but this has certain setbacks. It would require implementing an additional stack that significantly increases the size of the code and will duplicate the work that is already done by most browsers. Also the memory representations will be different to that of the browser runtime and therefore will not allow for easy interop with existing web apis.
A better approach here would be to wait for browsers to implement the additional WebAssembly extensions for garbage collection, exception handling, and tail calls. This does come at the cost of not being able to finish the WebAssembly backend in the short term. For now the intended approach is to implement experimental api’s when they become available in browsers and help with the implementation process as much as possible.
Advantage of CMM
During compilation OCaml creates and uses object files that are identical to those generated by C languages. As a result OCaml can use linkers that are intended for C. To have the same kind of behaviour for WebAssembly, OCaml only needs to follow C’s lead here — which in this case manifests itself through the “WebAssembly Object File Linking specification”: https://github.com/WebAssembly/tool-conventions/blob/master/Linking.md. This specification has been implemented in LLVM’s LLD linker. This allows for the same kind of behaviour and interoperability as already present in the OCaml compiler.
A WebAssembly object file is mostly similar to a normal webassembly file — but has a custom section for linking which contains a symbol table to which the symbols refer. And contains a relocation section that has the locations of what needs to be replaced during linking.
The only downside to using LLD is that it requires you to change LLD if you want to implement experimental WebAssembly APIs.
As mentioned before, due to having no stack access implementing the OCaml runtime is challenging in WebAssembly. Therefore there is no implementation of the OCaml runtime yet. It seems wise to wait at least until there is support for garbage collection before implementing the runtime.
The code that is generated can be made working if you combine it with the JSOO runtime and some extra code to convert the memory representations. It was decided however not to continue down this path as it seems like a dead end — ideally you want to have garbage collection support here and implement the runtime in WebAssembly itself.
An advantage of having everything in one compiler is that you can also give guarantees to the garbage collector. This guarantee is however lost when targeting WebAssembly — extra checks are required to make sure the garbage collection is correct. Ideally this would be pushed to a minimum. The initial garbage collection versions don’t have this yet though.
Another area of concern is the debugging story. It should be possible to target the WebAssembly DWARF specification from OCaml once DWARF support lands in OCaml. Note that the current version of the backend is a bit dated and needs to be synced with the trunk version of the compiler again at some point.
There are most likely more challenges waiting, this is by no means a comprehensive list.
Status of the backend
The WebAssembly backend currently is based on the WebAssembly MVP spec. It supports the WebAssembly object file format and can compile the OCaml part of the stdlib to WebAssembly. It does not compile anything of the OCaml runtime to WebAssembly.
To progress the WebAssembly backend further, support is required for the following upcoming WebAssembly extensions: Garbage Collection, Tail Calls, and Exceptions. Once garbage collection support is available it makes sense to start work on the OCaml runtime.