Today, I am going to try a bold experiment.
Optimize for what metric? Performance of code execution (eg. how long does it take to calculate the n-th prime number).
Rules of the game:
- I have to do it in a mostly automated way
- Adding type annotation is the only source modification allowed
TLDR: I mostly managed to succeed! Go to https://carlopi.github.io/js-opt-benchmark/ to run the benchmarks on your own device/browser of choice.
How to go on solving such a challenge? Is this even possible at all?
We just have to find a way to perform these more complex optimizations ahead of time.
First we will do a step back, who am I? I am Carlo, compiler engineer at Leaning Technologies.
I am in a privileged position: I have working knowledge of a tool that would make the optimization and the code generation steps doable.
I took what seemed the faster to prototype path: compiling JS to C++.
(Note, one of TypeScript explicit non-goals is optimizing the resulting code, so the TypeScript Compiler is mostly busy checking that no forbidden operations are done, using the type information whenever provided)
TypeScript AST has currently 300+ kinds of nodes. Like PlusToken, AsteriskEqualsToken, Identifier, IfStatement, and many more. Here the complete list: typescript.d.ts.
Given that I had access to the AST already parsed, I began implementing a recursive visit of every node of Abstract Syntax Tree, implementing one by one every node I encountered.
I kept adding implementations to this visitor as long as it was required to run a few benchmarks.
I also took another shortcut, I will require parameters and returns values to have a type assigned explicitly. This simplifies the project a lot, allowing also more optimizations to be made.
How does this intermediate C++ code look like?
It’s basically all there, taking advantages of 2 Cheerp’s features:
- JSExport tagging to export the given classes or functions
Now we just have to run Cheerp on it.
/opt/cheerp/bin/clang++ intermediate.cpp -o optmized.js -O3 -target cheerp
One thing to note is that the non-readable version ends up being roughly as big as the original (minimized) code. This comes from two factors:
- we are not shipping our own standard library but we are relying on the language functionalities
- Cheerp generates quite compact code
Micro Benchmark Time
Benchmarks are hard to get right, and benchmarks on micro programs have their own additional set of limitations. Read for example the disclaimer in the central part of Surma’s article or “Why measure toy benchmark programs?”.
I run this toy compiler of mine on this programs:
- algorithmic blurring of an image
- implementation of a BinaryHeap data structure (= push objects in, pop always the most valuable item from the heap)
- bubble sort
- computation of the n-th prime number
- computation of n-th Fibonacci number
- Monte Carlo estimate of Pi
- N-body simulation
While there is huge variability (some benchmarks are only faster on certain engines and more or less equivalent in others), some benchmarks show significant performance boosts (eg. 1.8x speedup across many devices / engines combinations). Especially isPrime and BinaryHeap show consistent improvements that I trust have to do with actual optimization being done.
There are a few notes to be made here:
- This numbers are only lower bound on what’s doable with such a tool. There are many optimizations (eg. integers everywere!) that were out of scope.
We did it.
The ideal workflow for such an optimizer would be mostly automated, with the only step requiring domain knowledge being adding type information.
The actual process I iterated to add a new benchmark/test looked like this:
- if there are type errors (like: “original.ts (12,34): Missing type in ParameterDeclaration”), add the relevant types, go back to 2
- if there are missing AST node to be implemented, implement them, go back to 2
- run Cheerp on the generated C++ file
- if there are compilation errors, find the error in the AST visitor and fix it
- tests + benchmark
It’s far from ideal and brittle, but worked enough to get this experiment up and running in a few hundreds lines of code.
These are some related projects:
Google’s Closure Compiler
Performance improvements are explicitly not in the scope of the TypeScript compiler.
I know very little of AssemblyScript, but there is plenty of cool work being done, check it out.
This post is long enough, and probably I could fill many more paragraphs talking about what WebAssembly is or is not.
Thanks for following until here, I would be very curious to hear any feedback.
I am available on Twitter at @carlop54002226.