Using LLVM from Rust to generate WebAssembly binaries

UPDATE 08/2018: Binaryen no longer supports s2wasm and the official "wasm32-unknown-unknown-wasm"target is now stable enough to use via llc and lld. See the footnotes for more info.


To be clear, this article is intended for people who are writing their own compiler in Rust and want to use LLVM as their backend. If you’re looking for compiling the Rust language itself to WebAssembly, check out the rust-wasm online book.

While I won’t dive deep or explain what all the LLVM APIs do, hopefully this post helps jumpstart those who want to use LLVM from Rust. I also include how one could use this to generate WebAssembly, but the first part of this post is target platform agnostic.

It’s no secret the LLVM documentation is notoriously lacking (aside from the Kaleidoscope tutorials). Combine that with bleeding edge tech like Rust and WebAssembly and you should be prepared for lots of long nights of hacking and sharpening of your Google skills!

Setup

I’m going to gloss over creating a new Rust project, building, etc as there’s plenty of tutorials on it.

We’ll also assume you’ve already set up Rust, Binaryen, and a build of LLVM with -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=WebAssembly — this gist link recently changed to say that it is out-of-date, but I personally disagree as my experience within the last week with lld is that it’s still not yet as stable. Soon though!

We’ll also be using some of the binaries from both LLVM (found in $LLVM_PATH/bin) and Binaryen (found in $BINARYEN_PATH/bin) so either add these to your $PATH or when using these binaries use their fully-qualified locations.

Because LLVM doesn’t have official Rust bindings (as of this writing), we’ll need to pull in some from the community to save us from having to write our own. I evaluated several, including attempting to use the internal ones that rustc uses, but in the end I found llvm-sys to be the easiest. It doesn’t wrap them in Rust-like conventions like snake_case, but in this case I’m glad because it would make Googling more annoying (converting between coding styles).

[dependencies]
llvm-sys = "60.0.2"

For now just create a simple main.rs

Providing our LLVM build to llvm-sys

It’s really important to provide the custom LLVM we built with WebAssembly support, not any other version that might be in your $PATH. Because LLVM doesn’t guarantee compatibility between versions, the version of LLVM we’re going to use needs to be compatible with the version llvm-sys expects. See their compatibility guide in the README to match up your versions. I used v60.0.2 of llvm-sys and v6.0.0 of LLVM.

To tell llvm-sys where to find the correct LLVM version, you need to set the correct environment variable following the format LLVM_SYS_{VERSION}_PREFIX e.g. LLVM_SYS_60_PREFIX in my case.

LLVM_SYS_60_PREFIX=/path/to/llvm cargo build

Generating bitcode

Now that our project builds, let’s generate some LLVM bitcode, which is the binary version of LLVM’s intermediate representation (aka IR). Since we’re going to be calling the LLVM’s C API from Rust we’re going to need an easy way to create C strings and pass pointers to them around. Let’s create a macro to do this for us:

Now we’re ready to create the most basic LLVM module:

If we build and run this, you should see the resulting main.bc file gets created. It’s our LLVM bitcode!

btw if you haven’t discovered cargo-watch yet, I’m a big fan.

If you want to view a human-readable form of the LLVM bitcode, you can use the llvm-dis command:

llvm-dis main.bc -o main.ll

Right now the result isn’t very interesting

; ModuleID = 'main.bc'
source_filename = "main"

WebAssembly? Yes, please!

Now that we can generate LLVM bitcode, let’s take the next step and convert this into WebAssembly!

Before we get ahead of ourselves we need to make one change to our code. LLVM’s IR, the bitcode, isn’t always platform independent. In the case of WebAssembly, we need to tell LLVM to generate bitcode that is compatible for a specific target triple. It’s called “triple” because it used to be contain three segments, though even though now it’s four the name sticks. So it’s a string that has the format <arch><sub>-<vendor>-<sys>-<abi> e.g. "wasm32-unknown-unknown-wasm"

There are two WebAssembly triples LLVM currently supports with differing ABIs. "wasm32-unknown-unknown-wasm"and "wasm32-unknown-unknown-elf"

To set the target triple we callLLVMSetTarget()

LLVMSetTarget(module, c_str!("wasm32-unknown-unknown-elf"));
// etc
LLVMWriteBitcodeToFile(module, c_str!("main.bc"));

The "elf"ABI in this triple is mostly used as the convention the Binaryen tooling uses, like s2wasm. It can convert LLVM .s files to the final .wasm format — more on this later. Whereas the "wasm"ABI is what clang uses to generate .wasm files directly, without Binaryen at all ¹. Soon that will be the preferred triple, but in my experience it’s not yet ready so we’ll focus on the "elf" ABI for use with Binaryen.

Here’s what our current code should look like:

Compiling LLVM bitcode to WebAssembly

Now that we have our triple set to "wasm32-unknown-unknown-elf" and we’ve recompiled our project, we can compile our LLVM bitcode .bc files to WebAssembly by using llc and s2wasm.

The llc command, provided by LLVM, compiles the bitcode to assembly code for the target architecture. Depending on the target platform it can output either textual assembly or binary object files, which then can be assembled and linked using other tooling.

In the case of WebAssembly and the "elf" ABI we’ll be generating textual assembly in the form of .s files.

Side note: if you view the contents of the .s files you might not recognize everything. It’s not clear where the convention for this particular textual assembly language comes from. To me it looks like a combination of traditional x86 assembly structure with wat (WebAssembly Textual representation) instructions— If you know/learn the answer to this, I’d love to know!

Let’s go ahead and convert our bitcode .bc file to our .s assembly file:

llc -march=wasm32 main.bc -o main.s

Now we can provide that .s file to Binaryen’s s2wasm utility to get our final .wasm file. We’ll first use it to emit a textual representation of WebAssembly, so we can verify it looks correct:

s2wasm main.s -o main.wast

Here’s what the result should look like, generally:

(module
(table 0 anyfunc)
(memory $0 1)
(export "memory" (memory $0))
)

If you’re familiar with WebAssembly already, this should be pretty close to what you expected — though arguably the function table and memory are extraneous at this point.

To instead generate the true, binary .wasm file you can use in your browser, pass the --emit-binary flag:

s2wasm --emit-binary main.s -o main.wasm

Hello, world.

Let’s generate a WebAssembly module that logs “hello world”.

Because WebAssembly itself has no direct I/O access, it always needs to import and call a function from the runtime environment to do so. For this example we’ll assume that the environment has a void log(const char *msg); function we can import and call to print a string to the console.

Since this isn’t intended as a tutorial on the numerous LLVM APIs, here’s the final Rust code without further ado:

When run and the resulting bitcode compiled through llc and s2wasm, it gives us:

(module
(type $FUNCSIG$vi (func (param i32)))
(import "env" "log" (func $log (param i32)))
(table 0 anyfunc)
(memory $0 1)
(data (i32.const 16) "hello world\00")
(export "memory" (memory $0))
(export "main" (func $main))
(func $main (; 1 ;)
(call $log
(i32.const 16)
)
)
)

Running the WebAssembly from JavaScript

Now that we have our “hello world” .wasm file compiled, we want to actually execute it. For simplicity, we’ll use a new version of a web browser like Chrome or Firefox. The easiest way to compile and run a WebAssembly module is to use fetch() with WebAssembly.instantiateStreaming().

Remember our void log(const char *msg); function? Since our WebAssembly module imports it, we need to make it and provide it when we instantiate the module.

Summary

The final project can be found on GitHub: https://github.com/jayphelps/using-llvm-from-rust-to-generate-webassembly (though it doesn’t include LLVM/Binaryen).

Hopefully this was helpful to you! Feel free to follow me on Twitter, or donate to my open source efforts on Patreon.

Does your company need help with WebAssembly, Rust, or LLVM? Hire me and the team at This Dot to help!

Thanks to Sven Sauleau, Peter Marheine, and Adam Perry, for reviewing and providing feedback!


Footnotes

[1] If instead of using "wasm32-unknown-unknown-elf" and Binaryen you’d like to use the more bleeding edge "wasm32-unknown-unknown-wasm", this is what worked for me:

llc -march=wasm32 -filetype=obj main.bc -o main.o
lld -flavor wasm --allow-undefined main.o -o main.wasm

Soon this will be the preferred way, maybe even by the time you’re reading this, but as of this writing I still prefer the Binaryen route.