Exploring Compilation from TypeScript to WebAssembly

Twelve weeks ago, three of us — interns on the Web Platform team — set out to build a compiler from TypeScript to WebAssembly. WebAssembly is a size and load-time-efficient binary format suitable for compilation to the web, with the potential to provide huge performance benefits for web applications. (We found this explanation of WebAssembly useful.)

Before this, none of us had ever written TypeScript or even heard of WebAssembly, and we didn’t know much about compilers either. With that in mind, to say the project goal was ambitious was an understatement.

We started with the basics: building a working mental model of a compiler and abstracting out any details we didn’t really need to know. We read the WebAssembly documentation, tried compiling and running WebAssembly with wasmFiddle, and wrote some simple TypeScript programs.

At first, we developed on a prototype compiler on a fork of the official TypeScript compiler. This is when we hit our first major challenge: once we implemented a few basic features (bitwise operations, binary expressions), we realized that the type system in TypeScript was fundamentally incompatible with the four native numeric types in WebAssembly. We made the decision to assume that number always referred to a float64, because we didn’t want to deviate from TypeScript too much. However, that meant we couldn’t handle wasm operations that could only be done with an int type (remainder, for example) without losing efficiency. We set the issue aside, and instead worked towards emitting the appropriate wasm bytecode for an if statement.

In implementing if statements, we found a new set of challenges that required that we understand more about the structure of the wasm bytecode itself. We started by going through the abstract syntax tree generated by the TypeScript compiler [1], and then mapping that to emit if and else instructions, but we could not figure out how to evaluate a condition and deal with opcode immediates properly. We took a look at the output from wasmFiddle [2] for an if statement and noticed that they were using blocks and break-if opcodes to create the structure. We realized that the two approaches were essentially equivalent, but the blocks were easier for us to keep track of.

[1] Part of the AST typescript generates for an if statement
[2] Bytecode output from wasmFiddle: highlighted section is the main function which contains if, else if, and else statements

We still weren’t able to figure out exactly what to emit, so we tried modifying the wasm bytecode by hand to see if we could get an example if statement to produce the result we expected. This led to a whole slew of additional problems, namely that we needed to also modify the length arguments in the bytecode for each section. We weren’t able to identify that this was the issue right away, but tracking it down meant we were looking at the opcodes individually and learning what section they belonged to, which operations required opcode immediates, and which parts were actually just ASCII encoded characters. Once we had this foundational understanding of wasm, it became much easier to identify the mistakes we were making previously with how to emit opcodes and their immediates.

Soon after making that progress with if statements, we found two open source projects that were attempting to compile a subset or variation of TypeScript to WebAssembly: AssemblyScript and TurboScript. Given what we had learned from our prototype compiler, we liked that AssemblyScript was new and had still had room for us to contribute. It also leveraged the TypeScript compiler, which we had come to be familiar with. Thus, we reached out to AssemblyScript. The main contributor, Daniel Wirtz, was very open to collaborate and accept pull requests from us.

We spent some time experimenting with AssemblyScript, encountering a fair number of build issues and discovering features that we didn’t realize had already been implemented (strings or classes) and features that we expected to work, but didn’t (arrays and floats). We opened several issues along the way, which was new to us as this was the first big open-source project any of us had contributed to. We were soon ready to contribute a feature: compiling array-list initializations in TypeScript to WebAssembly. The AssemblyScript compiler needed to use malloc and memset to set a section of WebAssembly’s linear memory for the array, and then emit the opcodes to set the capacity and the size of the array in memory and then the elements of the array at set intervals of 4 or 8 based on the size of the type.

Once we had started contributing to AssemblyScript and the compiler features, frontend, and testing infrastructure began to mature, we needed to identify what was working well and what could be improved. We then wrote a few demo programs and compiled them to wasm with AssemblyScript, but to our dismay found that the wasm was slower than the JavaScript. Furthermore, we found that writing a compelling demo with a basic compiler was complicated, and that we were missing a lot of features, such as library functions. It was particularly challenging to make fair comparisons between JS and WebAssembly, and to update the UI of the demos as each program did its computation, since both are single-threaded. We used web workers to solve this, which was something we hadn’t done before. We then started doing some detective work to figure out why we were seeing a slowdown in WebAssembly.

The first step we took was to look at the code we were compiling, and break that into parts. We had implemented a sudoku solver that did a lot of computation, but it was also creating and iterating through large arrays and making a lot of function calls. We started compiling small programs that did each of these tasks individually and saw that while computation was much quicker (as we had expected), array access, tail recursion, and function calls were quite slow. For example, the iterative calculation of Fibonacci run 5x faster, yet an array traversal of 4000 elements run 4x slower. We realized we were missing some optimizations that pipelines like Emscripten + binaryen took advantage of. We then created a set of tests based on the Sunspider benchmarks that we could compile with AssemblyScript and Emscripten + binaryen, and wrote them in JavaScript, AssemblyScript-friendly TypeScript, and C. We were very limited on the benchmarks we could write since they needed to be compatible with both compilers, yet we managed to obtain significant results. We then compared the performance to identify in what tasks AssemblyScript was still slow.

Over the course of this project, we learned to test early and constantly, and to never take a feature for granted- the complexity in something even as fundamental as an if statement can be very tricky to debug in WebAssembly. We also learned that making a compiler is a lot easier if you leverage powerful existing tools. We had the privilege of utilizing the incredible work Daniel Wirtz did in building AssemblyScript, as well as the work done by the TypeScript team here at Microsoft, and all the people who worked on binaryen. The overhead to maintain a backend or build a parser would have prevented us from accomplishing as much as we did if we hadn’t embraced the existing open-source tools in the TypeScript and binaryen compiler infrastructure.

Apurva Raman, Ignacio Ramirez Piriz, and Sanat Sharma