Writing a simple transpiler in JavaScript

Lambda calculus to JS, the transpiler everyone was crying out for.

Published in

The Startup

8 min readMay 21, 2019

Not pictured: a metaphor for my transpiler

I don’t know why I decided to write a transpiler, but I do know I was expecting it to be much harder than it was. I also don’t know why I decided to learn about lambda calculus, but that was as hard as I was expecting.

This article is about how I wrote a very basic transpiler to try to get to grips with lambda syntax. You don’t need to know anything about lambda calculus or compilation — I’ll go step by step through how I wrote the tool, explaining the weirdness of lambdas as required. The full source can be found here.

A note on terminology: ‘compilation’ refers to translating something into another language, regardless of how ‘high-level’ either of them are. ‘Transpilation’ refers to translating a language into one roughly as high-level as another. I refer to my compiler as a ‘transpiler’, but you could argue that lambda syntax is actually higher-level than JavaScript. I’ll use both terms in this article.

Why would anyone want to do this?

My reason was that it’s a structured way of learning an unfamiliar syntax — one that doesn’t have its own interpreter or anything to interactively play around with, because lambda calculus isn’t a programming model so much as a mathematical system. Other reasons might include writing a DSL for your own specific use case, or even just out of an interest for how the words you write get turned into machine code.

Mercifully brief overview of lambda syntax

This was taken in Greece, where the letter λ comes from

This could be its whole own article, but I’ll keep it short. Lambda syntax, invented by genius mathematician Alonso Church in the 1930s, is a way of expressing computable functions. In that sense it’s a bit like its own programming language, except one where there are only functions and nothing else. Each function can take only one argument. Functions look something like this:

λx.λy.y

The lambda symbol λ denotes a function, and the dot means that you apply the next thing to that function. Letters without a λ in front of them are arguments. So written in JS, λx.λy.y is the same as x => y => y. Because you can only have one argument per function (in fancy terms, functions have an arity of 1) you end up using currying a lot. (x => y => y)()(5) gives you 5, and it looks like a completely useless function in both JavaScript and lambda calculus (which doesn’t even have numbers anyway), but I promise it isn’t.

With such restrictive rules, lambda syntax is actually the perfect thing to write a compiler for, because there are so few cases to worry about.

The aim

I wanted something that takes a lambda expression and gives back the equivalent in JavaScript, e.g. going from λx.λy.y to x => y => y . I got there, for basic lambda expressions anyway. Things I learnt:

Writing a compiler is a great use case for test-driven development. I’m not really a TDD guy by nature — I suspect that it’s only a thing because it’s so easy to give talks on at conferences — but despite my cynicism, this is a fantastic use case for it. Compiler design should strongly adhere to separation of concerns, and it should always be very clear what each unit’s output should be given an input. Following TDD principles made the whole experience much more straightforward.
Writing a compiler is a good learning experience, but you will end up reaching diminishing returns. I could have ploughed on and wrapped up all sorts of edge cases, different lambda syntax styles, more complex functions, and so on. But at some point it’s going to be a case of fiddling around with regex and shuffling logic without learning anything new. I might continue to mess around with it, but I feel like I’ve definitely got the most value out of the experience already.

Compiler anatomy

They can get incredibly complex, but compilers all have three basic parts:

Lexer: breaks up an expression into chunks (‘lexemes’ or ‘tokens’). We’ll be passing in a whole lambda expression at once, but to reason about it, we’ll need to separate it into the different lexemes that mean different things.
Parser: converts the list of lexemes into a tree of logical operations — an Abstract Syntax Tree, or AST. This sounds complex, but it’s not too bad. The AST is language-agnostic — it bears no relation to the source or target language. This might make more sense when you see one below.
Compiler/Transpiler: turns the AST into the target language (JavaScript). This is a case of following the AST logic tree and working out how to change it into JS, depending on what’s happening before and after each node.

Compilers might also describe the grammar of their source language. This is typically written in a format called EBNF, and you can see my example here. I’m not using mine programmatically, but still found it a useful exercise, as I wasn’t entirely confident in how to best reason about the syntax. It also helped when writing the lexer, to determine how to break up the expressions it’s given. The grammar of lambdas is extremely simple, as you can see from my example, but you can imagine how complex it can be for proper programming languages.

Let’s go

I’ll go over my first pass, which was to support the simplest test case I could think of: the ‘identity function’, λx.x, which should translate to x => x (a function that just returns whatever it’s given). At this point I couldn’t be bothered to copy and paste the λ character, so I’ve used l instead.

Writing the lexer

This was super easy, because I’d already worked out the basics of lambda syntax, and the grammar is very simple. Behold my lexer in its entirety:

(str) => str.split(/(?<=\.)/);

Unfortunately there’s no getting away from regex, so I bit the bullet and entered the hell that is positive and negative lookaheads and lookbehinds. This takes the string and returns an array that has split the string on '.' (but keeping it in the preceding element), so our identity function has become [’lx.’, 'x’].

I flip-flopped a lot on how I should actually break up the expression. Should the ‘apply’ . be its own lexeme? Should the lambda symbol l? Ultimately it came down to the simplest implementation that could be parsed, which is what we see here.

Writing the parser

This is the most wordy section, partly because I wanted to throw an error if the parser identifies a lexeme as more than one type of thing (which it should never do).

We have the lexer:

And the model:

So to break it down and simplify, the steps are:

Take the first lexeme (in our example, lx.)
If it matches the regex for a lambda, add the lambda parser to our list of functions to execute later
If it matches the regex for a bound variable, add that parser to the list
If the list is empty or has more than one thing in it, something has gone wrong because each lexeme should only match exactly one case. Throw an error.
Parse the lexeme, returning a new instance of Lambda or BoundVariable. For the new instance’s apply field, return the result of another call to parse with the next lexeme (this will be null if this was the last lexeme in the array).

As you’ll see from step 5, we’re using recursion to build up a tree of operations. The end result will be our Abstract Syntax Tree:

{
    symbolType: 'lambda',
    variable: 'x',
    apply: {
        symbolType: 'boundVariable',
        variable: 'x',
        apply: null
    }
}

We’re really lucky with lambdas, because the tree will only ever have one branch — you only apply a function to one thing, another function. This removes an enormous amount of complexity. Because the identity function has only two elements (the lambda and the bound variable) this tree is only two deep. We know we’ve hit the end when the next apply field is null.

Now we need to take the AST and turn it into JavaScript.

Writing the transpiler

The transpiler is going to traverse the AST and build up a JS expression:

So it goes down the AST (recursively again), and if it sees a lambda it outputs a string with the associated variable + => and if it sees a bound variable, it adds brackets. These string expressions are all added together, and when it receives an apply node of null it returns an eval() of the built-up expression. The end result is x => (x), which is the identity function!

Next steps

As you can see from the full source, things got slightly more complex as I added more use cases. I put in a wrapper/controller so the tool could be called with command-line arguments and output a stringified result, which meant I could also add end-to-end tests. The transpiler can also handle unbound variables that are applied to the function, so lx.x 6will return 6, but of course that means adding support for unbound variables and spaces in the lexer and parser. As I added more test cases, the thing I noticed the most was that it only tended to be one or two of the modules that failed, because of the separation of concerns in the structure of the compiler — converting to and from an abstract syntax decouples the translation logic, which is why you do it in the first place.

I’d recommend anyone have a go at creating a compiler. As I’ve hopefully demonstrated, it’s not that hard to whip one up for at least trivial use cases. It’s a great way to learn an obscure grammar like me, or just if you’re interested in investigating the internals of how languages are translated. Maybe I’m naturally drawn more to the esoteric, but I actually enjoyed the process, so let me know if you’ve done anything similar!