A Step Towards Compiling TypeScript to Native

One of my pastimes is writing compilers, they’re just fun to work on. One of those projects I’ve been working for a couple of months now is a TypeScript to native compiler.

Now just to be absolutely clear here, by native I do mean actually native as in native AOT (ahead-of-time compilation) to machine code and not as another buzzword for yet another JavaScript framework.

A Quick Breakdown

The compiler is fairly simple, it’s built as a layer on top of the TypeScript compiler which adds a couple of transformations into the compilation pipeline before emitting C++ source files which are then compiled and assembled with a traditional toolchain.

Lets do a quick breakdown and highlight some of the some of the steps the compiler takes when converting from the strict subset of TypeScript to what I consider to be fairly readable and concise C++ code.

Automatic Reference Counting and Escape Analysis

So first things first, one of the more obvious and major differences between TypeScript and C++ is that the former is TypeScript is garbage collected while C++ is not.

I could have implemented this with garbage collection in C++ but an important factor to me was the readability and clarity of the output source code so I ended up going with reference counted pointers, also known as shared pointers.

However, having shared pointers everywhere is going to leads to extra indirection and cache misses so, to deal with this we do some escape analysis to determine if a value actually needs to be a pointer at all.

In terms of the implementation, a transform function checks every potential reference node in the AST (Abstract Syntax Tree). If it doesn’t escape its owning scope it becomes a reference counted pointer.

For example, the following TypeScript code

class Point {
x : number;
y : number;
constructor() {
this.x = 0;
this.y = 0;
}
}
class Shape {
points : Array<Point>;
}
function createShape() {
return new Shape();
}

Is translated into the following C++ code

#include "builtin.cc"
class Point {
public:
double x;
double y;
public:
Point() {
this->x = 0;
this->y = 0;
}
};
struct Shape {
Array<Point> points;
 Shape() {
this->points = Array<Point>();
}
};
Reference<Shape> createShape() {
return Reference<Shape>(new Shape());
}

Read Only Types

When a class is immutable, that is it only has read-only fields it’s considered a primitive value type that is passed and will always be treated as a value type, passed around as const references and returned as values.

For example

class Vector2 {
readonly x : number;
readonly y : number;

constructor(x : number, y : number) {
this.x = x;
this.y = y;
}
 static add(a : Vector2, b : Vector2) : Vector2 {
return new Vector2(a.x + b.x, a.y + b.y);
}
}

Would result in the the following C++ code

class Vector2 {
public:
double x;
double y;
public:
Vector2(double x, double y) {
this->x = x;
this->y = y;
}
 static Vector2 add(const Vector2& a, const Vector2& b) {
return Vector2(a.x + b.x, a.y + b.y);
}
};

An Example

Wait a minute, this is TypeScript so where is the main function and the runtime?

The main function is generated and it contains a call to “run” which is also a user function. It’s role is to bootstrap everything then call the supplied function which will execute all side effects in the order they’re imported and exit once the event queue is empty.

So this

function fibonacci(n : number) : number 
{
if (n <= 1) {
return n;
}
 return fibonacci(n - 1) + fibonacci(n - 2);
}
fibonacci(42);

Becomes this

#include "builtin.cc"
double fibonacci(double n) 
{
if (n <= 1) {
return n;
}
 return fibonacci(n - 1) + fibonacci(n - 2);
}
int main(int argc, char *argv[]) {
return run(argc, argv, []() {
fibonacci(42);
});
}

The node version runs in about five seconds where-as the C++ version clocks in at little under two seconds but it’s a toy benchmark so I wouldn't read too much into it.

In Conclusion

So is this a viable project? Well yes but mostly no. The examples work but it’s not very robust at the moment being one of many toy projects. There are features that are just not implemented yet and some features that are very dynamic in their nature that cannot be reasonably implemented with the C++ generator (here’s also a branch which has some very bad LLVM IR generation but that’s another story).

There’s also the library aspect, being compatible with Node would be neat but it’s no small feat to catch up with Node and maintain perfect compatibility. On the other hand there’s also Deno which aims to ship with a tiny runtime but it’s still young so it’s hard to tell what that runtime environment is going to look like right now.

Stay tuned for a deeper dive coming to you soon.