Exploring Compilation from TypeScript to WebAssembly
Twelve weeks ago, three of us — interns on the Web Platform team — set out to build a compiler from TypeScript to WebAssembly. WebAssembly is a size and load-time-efficient binary format suitable for compilation to the web, with the potential to provide huge performance benefits for web applications. (We found this explanation of WebAssembly useful.)
Before this, none of us had ever written TypeScript or even heard of WebAssembly, and we didn’t know much about compilers either. With that in mind, to say the project goal was ambitious was an understatement.
We started with the basics: building a working mental model of a compiler and abstracting out any details we didn’t really need to know. We read the WebAssembly documentation, tried compiling and running WebAssembly with wasmFiddle, and wrote some simple TypeScript programs.
At first, we developed on a prototype compiler on a fork of the official TypeScript compiler. This is when we hit our first major challenge: once we implemented a few basic features (bitwise operations, binary expressions), we realized that the type system in TypeScript was fundamentally incompatible with the four native numeric types in WebAssembly. We made the decision to assume that number always referred to a float64, because we didn’t want to deviate from TypeScript too much. However, that meant we couldn’t handle wasm operations that could only be done with an int type (remainder, for example) without losing efficiency. We set the issue aside, and instead worked towards emitting the appropriate wasm bytecode for an if statement.
In implementing if statements, we found a new set of challenges that required that we understand more about the structure of the wasm bytecode itself. We started by going through the abstract syntax tree generated by the TypeScript compiler , and then mapping that to emit if and else instructions, but we could not figure out how to evaluate a condition and deal with opcode immediates properly. We took a look at the output from wasmFiddle  for an if statement and noticed that they were using blocks and break-if opcodes to create the structure. We realized that the two approaches were essentially equivalent, but the blocks were easier for us to keep track of.
We still weren’t able to figure out exactly what to emit, so we tried modifying the wasm bytecode by hand to see if we could get an example if statement to produce the result we expected. This led to a whole slew of additional problems, namely that we needed to also modify the length arguments in the bytecode for each section. We weren’t able to identify that this was the issue right away, but tracking it down meant we were looking at the opcodes individually and learning what section they belonged to, which operations required opcode immediates, and which parts were actually just ASCII encoded characters. Once we had this foundational understanding of wasm, it became much easier to identify the mistakes we were making previously with how to emit opcodes and their immediates.
Soon after making that progress with if statements, we found two open source projects that were attempting to compile a subset or variation of TypeScript to WebAssembly: AssemblyScript and TurboScript. Given what we had learned from our prototype compiler, we liked that AssemblyScript was new and had still had room for us to contribute. It also leveraged the TypeScript compiler, which we had come to be familiar with. Thus, we reached out to AssemblyScript. The main contributor, Daniel Wirtz, was very open to collaborate and accept pull requests from us.
We spent some time experimenting with AssemblyScript, encountering a fair number of build issues and discovering features that we didn’t realize had already been implemented (strings or classes) and features that we expected to work, but didn’t (arrays and floats). We opened several issues along the way, which was new to us as this was the first big open-source project any of us had contributed to. We were soon ready to contribute a feature: compiling array-list initializations in TypeScript to WebAssembly. The AssemblyScript compiler needed to use malloc and memset to set a section of WebAssembly’s linear memory for the array, and then emit the opcodes to set the capacity and the size of the array in memory and then the elements of the array at set intervals of 4 or 8 based on the size of the type.
Over the course of this project, we learned to test early and constantly, and to never take a feature for granted- the complexity in something even as fundamental as an if statement can be very tricky to debug in WebAssembly. We also learned that making a compiler is a lot easier if you leverage powerful existing tools. We had the privilege of utilizing the incredible work Daniel Wirtz did in building AssemblyScript, as well as the work done by the TypeScript team here at Microsoft, and all the people who worked on binaryen. The overhead to maintain a backend or build a parser would have prevented us from accomplishing as much as we did if we hadn’t embraced the existing open-source tools in the TypeScript and binaryen compiler infrastructure.
Apurva Raman, Ignacio Ramirez Piriz, and Sanat Sharma