WebAssembly — The missing tutorial

I decided to take a look at WebAssembly in case it is going to be the next big thing. It is a project that evolves pretty fast, and most tutorials and examples are already outdated, so I’ve written down my findings here, in case other people are also interested.

In this first blog post I’m diving into the basics of the binary format. I know that no one will write this by hand, but in order to deal with it programmatically, we must at least know how it works.

The specification is currently available in form of a reference implementation and some loosely written notes on the design and semantics, but the following may be easier to digest for most people.

Basics

The first thing we need to learn how two of the basic data types work. The first being variable length integers, and the second being variable length arrays.

To avoid wasting too many bit on representing small values (the majority of values in practice), integers in WebAssembly are usually encoded a with variable length, similar to utf-8 encoding. The first byte contains the seven least significant bits of the original integer, plus a bit (the most significant one) indicating whether the encoding is complete or continues at the next byte containing the next seven bits of the original integer and another indicator bit and so on. So the number 5 will be encoded simply as 0x05 (00000101) while 517 will be encoded as 0x85 0x04 (10000101 00000100).

Variable length arrays (or vectors, as they are called in the spec) consists simply of a variable length integer specifying the number of entries in the array, followed by that many entries (not necessarily of the same size). Thus an empty array is represented by only a single byte (0x00), and an array containing the two variable length integers 5 and 517 takes up four bytes (0x02, 0x05, 0x85, and 0x04).

Sections

A WebAssembly module contains ten sections of which all are optional (unless required by other sections, of course) and all must occur in the following order. The modules are:

  • type — contains all type signatures used in the module
  • import — enumerates all external functions together with a reference to their type signatures
  • function — enumerates all internal functions together with a reference to their type signatures
  • table — currently only used to support indirect function calls, as required by C/C++
  • memory — specifies the initial and maximum heap size and whether it is accessible outside environment
  • export — lists internal functions exposed to the outside environment together with their names
  • start — specifies a function to run when the module is loaded
  • code — contains the actual function bodies with all the bytecode instructions
  • data — initializes the heap with predefined data (like strings)
  • name — names all functions and their local variables to help debugging

In this blog post we are only going to look at the type, function, export, and code section, but they all share a common format: A variable length byte array containing the id of the section (the name listed above) and a variable length byte array containing the actual content of the section.

A function section, for instance, will take the following shape:

0x08 // the size of the section id
0x66 0x75 0x6e 0x63 0x74 0x69 0x6f 0x6e // “function”
0x02 // the size of the rest of the section
0x01 // the actual content of the section
0x00 // …

A minimal module

Let us now see some WebAssembly in action. We will start by writing a minimal module with no sections (remember, they are all optional), leaving only the preamble consisting of a magic number and a version. The magic number is always the four bytes 0x00, 0x61, 0x73, and 0x6d (“\0asm”). The version is a 32-bit little endian number, currently 0x0b 0x00 0x00 0x00 (version 11).

So after enabling WebAssembly in your browser (if you haven’t already done so), you can execute the following javascript code:

Wasm.instantiateModule(new Uint8Array([
0x00, 0x61, 0x73, 0x6d, 0x0b, 0x00, 0x00, 0x00
]));

It may not be a particular interesting example, but notice that we get an error in the javascript console if we try to change just a single bit.

Adding a function signature

A function signature is a specification of the types a function accepts as input (parameters) and output (return value), and many functions can share the same signature. WebAssembly operates with four different types:

  • 0x01 — 32-bit integer (i32)
  • 0x02 — 64-bit integer (i64)
  • 0x03 — 32-bit floating point number (f32)
  • 0x04 — 64-bit floating point number (f64)

Strings, arrays, objects, functions, and so on, must be modelled using these primitives (e.g. by using a 32-bit integer as a pointer to a place in the heap where the more complex data structure is located).

A function signature consists of a type (currently the only available type is 0x40), a variable length array of input types and a variable length array of at most one output type. Thus a function taking no parameters and returns no value are encoded as 0x40 0x00 0x00. A function taking two 32-bit floating point numbers and return a 64-bit floating point number is encoded as 0x40 0x02 0x03 0x03 0x01 0x04.

The content of the “type” section is a variable length array of such function signatures. An entire section defining a single function signature taking two 32-bit integers and returning another 32-bit integer looks as follows:

0x04 // the size of the section id
0x74 0x79 0x70 0x65 // “type”
0x07 // the size of the rest of the section
0x01 // the number of function signatures
0x40 // the kind of function
0x02 0x01 0x01 // the two input types (i32, i32)
0x01 0x01 // the single output type (i32)

Adding a function

A function is defined by a reference to a function signature, a declaration of local variables, and the actual bytecode. It is located in both the function section and the code section, and both of these must contain the same number of entries. Functions don’t have names, but are referenced by their indices in these sections.

The function section simply contains a variable length array of variable length integers referencing a type signature from the type section. A module implementing a single function with the type signature from above has the following function section:

0x08 // the size of the section id
0x66 0x75 0x6e 0x63 0x74 0x69 0x6f 0x6e // “function”
0x02 // the size of the rest of the section
0x01 // the number of functions
0x00 // References the type signature with index 0

The code section contains a variable length array of the actual function bodies. Each body has a size of the entry, a variable length list containing local variables, and a variable length byte array containing the byte code.

To reduce space, each item in the array of local variables consists of a variable length integer and a byte, specifying the number of variables and their type. 0x02 0x03 0x01 0x04 0x02 for instance, would be a list with two entries, declaring three 32-bit integers and four 64-bit integers, thus resulting in seven local variables. Also, in WebAssembly a parameter to a function is considered a local variable. These appear before any of the local variables declared here.

The bytecode itself is worth its own blog post, but to understand the following example, it is enough to know that WebAssembly is stack based (like java bytecode), so most instructions take values off the stack as input and put the resulting value back on top of the stack again as output. The following example implements a function returning the sum of its two parameters.

0x04 // the size of the section id
0x63 0x6f 0x64 0x65 // “code”
0x0a // the size of the rest of the section
0x01 // the number of function bodies
0x08 // size of the first function body
0x00 // number of local variables
0x14 0x00 // load first parameter onto the stack
0x14 0x01 // load second parameter onto the stack
0x40 // replace the two top values on the stack with their sum
0x09 0x01 // return one result (from the top of the stack)

Exporting the function

So we implemented our first function, but in order to see it work we must expose it to the outside environment, and guess what — this is handled by the export section.

Each export declaration is an index to a function together with a variable length byte array holding its name. The export section contains a variable length array of these export declarations. So to export the function we just implemented (having index 0) with the name sum, we add the following section:

0x06 // the size of the section id
0x65 0x78 0x70 0x6f 0x72 0x74 // "export"
0x06 // the size of the rest of the section
0x01 // the number of exported functions
0x00 // reference to the 0th function
0x03 // the size of the name
0x73 0x75 0x6d // ‘sum’

Remember that sections must occur in a particular order, so this section is actually placed between the function and code sections.

Putting it all together

Combining all the above, the resulting program will look like this:

var bytecode = new Uint8Array([
0x00, 0x61, 0x73, 0x6d, 0x0b, 0x00, 0x00, 0x00,
0x04, 0x74, 0x79, 0x70, 0x65, 0x07, 0x01, 0x40,
0x02, 0x01, 0x01, 0x01, 0x01, 0x08, 0x66, 0x75,
0x6e, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x02, 0x01,
0x00, 0x06, 0x65, 0x78, 0x70, 0x6f, 0x72, 0x74,
0x06, 0x01, 0x00, 0x03, 0x73, 0x75, 0x6d, 0x04,
0x63, 0x6f, 0x64, 0x65, 0x0a, 0x01, 0x08, 0x00,
0x14, 0x00, 0x14, 0x01, 0x40, 0x09, 0x01
]);

Loading this module will result in an object with a property called exports. This property is an object itself containing all exported functions, that we can call just like regular javascript functions.

var mod = Wasm.instantiateModule(bytecode);
mod.exports.sum(1, 2); // returns 3

So by now we are actually able to write modules in WebAssembly and call it from javascript. Going the other way around and call javascript from inside a WebAssembly module is a completely different beast and will be the subject of my next blog post.