Diving into the Internals of TypeScript: How I Built TypeWiz

Adventures in TypeScript, Abstract Syntax Trees, and Creating Useful Development Tools

This is the story of how I created TypeWiz, a tool that automatically adds missing types to existing JavaScript and TypeScript code bases, allowing you to easily take advantage of type checking.

I wrote about what exactly TypeWiz is, what it does, and why you might want to use it in a previous blog post. This post is all about digging into details, sharing all of the many things I learned while creating this project, as well as showing you step-by-step how I did it.

Game Plan

Before I wrote a single line of code, I sat down and thought about how best to approach the problem at hand. I wanted to create a tool that would collect the types of arguments that are passed to functions at run-time. I considered two options for achieving this:

  1. Use some kind of debugging mechanism to add a breakpoint at the beginning of every function (or at every function call), and then use the same debugging mechanism to read the values of the arguments and determine their types
  2. Alter the source code to report the types for each function argument by adding some code at the beginning of every function.

I decided to go with the latter approach, as it would not be tied to a specific browser/platform (Firefox, Chrome and Edge all have somewhat different debugging protocols), as well as eliminate the need to deal with Source Maps, which are often missing or incorrectly produced. It would also give me an opportunity to play with the TypeScript Abstract Syntax Tree (AST), which I’ve been interested in playing with for a long time!

AST — Abstract Syntax Tree

An Abstract Syntax Tree is the way compilers represent your program in memory in a way that is agnostic to form. For example, in JavaScript, both "hello" and 'he\l\lo' have the same meaning: a string with 5 characters that contains the word “hello,” however, they have slightly different form. In the JavaScript AST, both will be represented as a StringLiteral with the value hello, regardless of how they were written in the source code.

In this post, since we’re going to work closely with the TypeScript AST, we will use an amazing tool called AST Explorer. The principle of operation is quite simple: you type any TypeScript expression on the left pane, and the right pane will show you the generated AST (you can also do other languages, such as HTML, SQL or even Scala, but let’s stick with TypeScript for this post). Let’s start with the above example: 'he\l\lo' (view it online):

As you can see, the tool spits a huge amount of data; the AST is quite verbose. Luckily, when we click on an element in our source code, the relevant AST node will be highlighted in yellow, so we can quickly find what we’re looking to figure out.

The name that appears at the top of the example, StringLiteral, is the type of the AST node, and then there is an object with all the relevant properties, such as text, which contains the values of the node.

So now that we have some idea what the AST looks like, let’s go back to the original mission: find function declarations and the arguments for each of the functions so we can report the argument values at run time, and try to figure out their types.

Finding Function Declarations

We will start with simple function declarations, such as:

function add(n1, n2) {
// whatever
}

What we basically would like to do is to detect this function call, figure out the names of its arguments — n1, n2— and find the offset in the source code where the function body begins, in order to inject our custom code (for now, let’s not worry about what would that code actually be — we’ll get to that shortly).

We’ll start by pasting this function to the AST explorer and clicking on the word function:

This reveals that function declarations are represented by the FunctionDeclaration node. Not very surprising :)

We can also see that this node has a name property, and more interestingly, parameters and body properties. Let’s expand the parameters property:

parameters is an array, whose elements are Parameter nodes. Each such node contains a property called name, of type Identifier, whose text property holds the name of the parameter as a string. Bingo!

Also, if we look into the body property of our FunctionDeclaration, we can see it has a pos property saying where the body of our function begins. This will come in handy in a moment.

Our First Lines of Code: Loading the AST

With all this in mind, I started writing the first lines of code. I’d try to load a source file, ask TypeScript to parse it and hand over the AST, and then go over it and try to print out the name of the arguments for every function I encountered. Let’s walk through this together.

First, we need to install TypeScript:

npm install typescript

Next, we import the typescript package and call the createSourceFile method to load our source file. A SourceFile is the top-level container of the AST, or the root of the tree.

Once we loaded the source file, we will convert it to JSON and print it:

We can run this script using ts-node. Simply pass the name of a TypeScript (or JavaScript) source file as an input (you can even use the script itself), and it will dump the entire AST to the console, in a similar fashion to what the AST explorer does.

As you can see, so far our project is quite simple. The only thing that requires a little explanation is the ts.ScriptTarget.Latest part, which basically tells typescript to parse the source file using the latest language features (ECMAScript 2018, at the time of this writing).

Traversing the AST and Finding Function Declarations

Now that we’ve loaded the AST, it is time to traverse it and look for function declarations!

As we have already seen, different AST nodes have different properties, but fortunately, TypeScript comes with a handy forEachChild function that we can use to iterate over the children of each of the AST nodes. We will take advantage of this and write a recursive function that goes over the tree in Depth-first order:

function visit(node: ts.Node) {
// TODO do something with node
node.forEachChild(visit);
}

Then, we simply need to call visit with our sourceFile to traverse the tree:

visit(sourceFile);

So now we have some code that goes all over the AST, and next we can check for each the nodes if it is a FunctionDeclaration node.

We could do this by checking the kind property of each node, and comparing it with the constant ts.SyntaxKind.FunctionDeclaration, but TypeScript has actually got a nice shortcut method called ts.isFunctionDeclaration() that does exactly that. This function also tells TypeScript that the type of this node is ts.FunctionDeclaration, so we get auto-completion for all the relevant properties such as any parameters which exist for function declarations, but not for other kind of nodes such as StringLiteral.

Let’s modify our visit function to check for function declaration, and for each one found, iterate over its parameters and print their names:

The getText() method is defined for all kinds of nodes, and is quite useful, as it returns the text representation of the node as it appears in the source code.

I learned a whole lot about TypeScript just by looking at the auto complete suggestions of my code editor (VSCode). Since TypeScript is written in TypeScript, we have full type information for the compiler — which turns out to be very useful when developing projects like TypeWiz!

Sprinkling Some Magic in the Source Code

So we now have code that finds all the function declarations and iterates over the parameters, let’s modify the source code to report their values.

Originally, I thought that modifying the AST and adding the new code would be the best way to go. There is even a project called ts-emitter, that claims to do that, even preserving the original formatting of the source code. However, it seems like it has some corner cases and bugs, so I decided to google for alternatives, and found this comment from Daniel Rosenwasser. Then, I remembered that the TSLint project has an auto-fix option, where it automatically corrects styling issues found in your code, such as a missing semicolons, so I went to see how they implemented it.

Their method is quite simple and pretty much fool-proof: they just create an array of “replacements”, each a combination of a position in the source file and text to insert at that position. Then, once they have all the replacements, they sort them by descending order, so that changes at later positions in the file occur first so they don’t affect the offsets of the next replacements. For the purposes of Type Wiz, we’re only interested in insertion, but it’s good to note that TSLint also removes or replaces text using this method, hence the name “replacements”.

You can find my initial implementation of this mechanism here, which is simplified and will work for the purpose of this post. If you’re interested in seeing the more robust implementation, here is the most up-to-date version.

The implementation is quite straight-forward, so we won’t go into the details. Rather, we’re going to modify our visit() function to accept a Replacement array, and then add a console.log call at the beginning of each function reporting for each argument its name, its value, the offset in the source code just after its name (hint: we will use it later as the point where we insert the type info we discover), as well as the current filename (which will also prove useful later, as we insert the types):

You can see above that the insertion point for the generated code is node.body.getStart() + 1. The reason for adding 1 to the offset is that the start of the body is at the opening braces of the function ({), and we want to insert the new code after the brace. We also need to JSON.stringify any value we put into the code (unless we know for sure it’s something innocent like an int), otherwise we may hit weird edge-cases when generating code due to special characters in file names, such as backslash, quote or double quote (which all are valid characters in a Unix filename).

Let’s also modify our instrument function to call the new visit function, and then print out the modified source code with all the added console.log statements:

Here you can find the complete program with everything, including the Replacement class and applyReplacement method implementation.

Let’s Start Collecting Type Information!

So far, we have the basic mechanism in place: we can gather type information in runtime! Still, we only print it to console, and we print the value of the arguments, and not their types (yet). Let’s add a few more tweaks to make it more useful.

First, we want to skip arguments that already have types, or have an initializer (default value), as TypeScript can infer the type from that. We can do this by adding an if statement that checks for a given parameter whether it has a type or a default value:

if (!param.type && !param.initializer) {
// here we can instrument this parameter
}

Next, let’s replace the console.log with calls to a custom function that we will implement shortly. We’ll call it $_$twiz (it will be a global function, so we need a pretty unique name).

Congratulations! We have completed the first part of our puzzle — at this stage we have recreated instrument.ts, as it appeared in the first version of TypeWiz. Note that there are a few more additions: we also check if our functions have a body (which is not always the case in TypeScript, such as in function declarations), we handle optional function parameters, and in addition to functions, we also instrument class methods which we identify using ts.isMethodDeclaration().

What’s Your Type?

The next part is pretty straightforward — we want to implement $_$twiz, which will look at the value it gets, find out the type, and record it into an array. We will define a helper function, called getTypeName, which gets a value and returns its type. For starters, let’s handle all primitive types and null:

Then, for many complex types, such as Date, Promise, Set and even HTMLDivElement, we can get their types by simply looking at their constructor names:

if (value.constructor && value.constructor.name) {
return value.constructor.name;
}

This will also work for user-defined types, as the object was created with the new keyword. For other objects, we will simply get Object as their constructor. Also, if some of the code has been minified (say a third-party library that we’ve been using), we’ll get garbage.

Finally, as arrays are very common in JavaScript, let’s extend the implementation to look inside them and try to figure out recursively which types they contain, so we can specify the type of the array elements using TypeScript’s generics:

Note how we use the Set data structure to eliminate repeated values in the itemTypes array

And now at least, we have the same implementation as the one in the first version of TypeWiz!

As you can see above, the $_$twiz function gets the type for the passed value, and then keeps a map from filename/offset to the types declared there. The offset, as you might recall from earlier, is just right after the parameter name at exactly the spot where we’d later insert the types we found.

A global helper function, called $_$twiz.get(), returns the collected type information as an array of triplets: filename, offset, and the discovered types for that offset.

So we have now completed the second piece of the puzzle. We have a function that records all the type information at runtime, and we can automatically modify the source code to call it at every start of a function. Not bad! Just one more thing…

Closing the Loop — Applying the Types

Now for the final piece of the puzzle. To do this, we just go over all the types we have collected, and then for each triplet of [filename, offset, types] returned by $_$twiz.get(), we go to the specified offset in that file, add a colon character (:), a space and the types, joined by ‘|’ — so in case we saw several different types for the same argument, we create a Union Type, meaning it can receive any of these types:

We need to sort the triplets by offsets, similar to how we did it above with the replacements. In fact, I used the same replacements mechanism as above. This implementation is the pretty much what I created in the first commit of TypeWiz — and as you might have already spotted, it discards the file information. It was only in a later version where I actually added code that would support working with multiple source files.

Putting everything together

At this point, I had all the pieces I needed for creating a proof-of-concept. Still, there were a few caveats:

  1. You had to run instrument.ts, on some typescript source file manually and save the output to some temporary script file.
  2. You would then have to paste the implementation of the $_$twiz function (and all the other relevant code) at the beginning of that temporary script file.
  3. You needed to call applyTypes(originalSource, $_$twiz.get()) when that temporary script terminated (using process.on(‘exit’), for instance), and save the result over the original script.
  4. Then you had to run the temporary script. And repeat this for each source file in your project.

That sounds like a lot of work for something that is supposed to automate things and save you time!

For this very reason, while developing TypeWiz, I decided to create unit tests that would allow me to verify each piece separately, as well as an integration test that would automatically check the whole flow. This enabled me to focus on building the core functionality of TypeWiz, without having to manually perform all the above steps for every change I made.

After getting the core of TypeWiz ready, I also created typewiz-node, a small wrapper that does all the above for you by basically wrapping ts-node and adding a little functionality to automatically process all your TS source files with TypeWiz’s instrument() function first. I then created another wrapper that does the same for Webpack projects, called typewiz-webpack.

I won’t dive into details about the inner workings of TypeWiz’s integration tests, but I did use some neat tricks you may find useful. One good trick was calling the typescript compiler directly using the ts.transpile() method, and using node’s vm module in order to compile and execute the instrumented input script in an isolated execution environment, so that the type discovery mechanism would run. This would let me apply the discovered types to the original source code and compare it with the expected results. The code is documented and shouldn’t be to hard to follow, so I strongly encourage you to take a look.

This single integration test has now evolved to a set of multiple tests for various cases, so you may also be interested to check out the latest version of the integration test suite.

Taking this further

I spent a weekend prototyping and building the first version, following essentially the steps I’ve explained in this post. Since then, I have done multiple improvements, such as adding Arrow Function support, finding the types for class fields in addition to methods and functions, and recently, my friend Pavel 'PK' Kaminsky, added support for figuring out the type of functions more accurately.

There is still much left to do, such as:

  • Automatically adding imports for types we discover (so if you defined SomeClass in modulea.ts, and then we discover this type for some other function in moduleb.ts, we would also add an import statement for it)
  • Discovering types for object literals
  • Automatically scanning your project and dependencies for interfaces and using them as fit
  • Discovering type arguments for Promises
  • Trying to figure out types of parameters for callbacks
  • And more!

I hope we can, over time, add all these features and together make TypeWiz even more useful for everybody. You are invited to try it on your own code base, report bugs and contribute!

This would be a good place to give a shout out to Madara Uchiha, who will be leading a group that will work on contributing to and improving TypeWiz during the next JavaScript Israel Goodness Squad event (and thank you to Amit Zur for making these events a reality).

Final Thoughts

I created TypeWiz because I was lazy: I didn’t like repetitively going over all places with missing types, and adding the manually. I shared TypeWiz because I wanted others to enjoy the fruits of my work. When I told the world about TypeWiz, it was received with much enthusiasm. This gave me the idea to explain exactly how I did it — so you can build on the what I learned and create more awesome tooling, or even better, improve existing tooling to better fit the needs of real, working developers. These same principles can be used to create new TSLint rules and auto-fixes, or to write tools that analyze your dependency trees (like Minko Gechev did in his AngularUP Talk).

I’d also like to give a big shout out to AST Explorer, which proved to be a magnificent time saver when working on this project, and it helped me quickly make sense of the vast amounts of data in TypeScript’s AST. Thanks to Felix Kling for creating it!

TypeWiz is all about learning and challenging the assumption that types had to either be inferred statically by TypeScript or typed-in manually. We are developers. We possess the gift of coding. Let’s create better tools for ourselves!