Writing a simple transpiler in JavaScript

Lambda calculus to JS, the transpiler everyone was crying out for.

Stephen Leigh
May 21, 2019 · 8 min read
Not pictured: a metaphor for my transpiler

I don’t know why I decided to write a transpiler, but I do know I was expecting it to be much harder than it was. I also don’t know why I decided to learn about lambda calculus, but that was as hard as I was expecting.

This article is about how I wrote a very basic transpiler to try to get to grips with lambda syntax. You don’t need to know anything about lambda calculus or compilation — I’ll go step by step through how I wrote the tool, explaining the weirdness of lambdas as required. The full source can be found here.

A note on terminology: ‘compilation’ refers to translating something into another language, regardless of how ‘high-level’ either of them are. ‘Transpilation’ refers to translating a language into one roughly as high-level as another. I refer to my compiler as a ‘transpiler’, but you could argue that lambda syntax is actually higher-level than JavaScript. I’ll use both terms in this article.

My reason was that it’s a structured way of learning an unfamiliar syntax — one that doesn’t have its own interpreter or anything to interactively play around with, because lambda calculus isn’t a programming model so much as a mathematical system. Other reasons might include writing a DSL for your own specific use case, or even just out of an interest for how the words you write get turned into machine code.

This was taken in Greece, where the letter λ comes from

This could be its whole own article, but I’ll keep it short. Lambda syntax, invented by genius mathematician Alonso Church in the 1930s, is a way of expressing computable functions. In that sense it’s a bit like its own programming language, except one where there are only functions and nothing else. Each function can take only one argument. Functions look something like this:

λx.λy.y

The lambda symbol λ denotes a function, and the dot means that you apply the next thing to that function. Letters without a λ in front of them are arguments. So written in JS, λx.λy.y is the same as x => y => y. Because you can only have one argument per function (in fancy terms, functions have an arity of 1) you end up using currying a lot. (x => y => y)()(5) gives you 5, and it looks like a completely useless function in both JavaScript and lambda calculus (which doesn’t even have numbers anyway), but I promise it isn’t.

With such restrictive rules, lambda syntax is actually the perfect thing to write a compiler for, because there are so few cases to worry about.

Trapped in mathematical abstractions

I wanted something that takes a lambda expression and gives back the equivalent in JavaScript, e.g. going from λx.λy.y to x => y => y . I got there, for basic lambda expressions anyway. Things I learnt:

  • Writing a compiler is a great use case for test-driven development. I’m not really a TDD guy by nature — I suspect that it’s only a thing because it’s so easy to give talks on at conferences — but despite my cynicism, this is a fantastic use case for it. Compiler design should strongly adhere to separation of concerns, and it should always be very clear what each unit’s output should be given an input. Following TDD principles made the whole experience much more straightforward.
  • Writing a compiler is a good learning experience, but you will end up reaching diminishing returns. I could have ploughed on and wrapped up all sorts of edge cases, different lambda syntax styles, more complex functions, and so on. But at some point it’s going to be a case of fiddling around with regex and shuffling logic without learning anything new. I might continue to mess around with it, but I feel like I’ve definitely got the most value out of the experience already.

They can get incredibly complex, but compilers all have three basic parts:

  • Lexer: breaks up an expression into chunks (‘lexemes’ or ‘tokens’). We’ll be passing in a whole lambda expression at once, but to reason about it, we’ll need to separate it into the different lexemes that mean different things.
  • Parser: converts the list of lexemes into a tree of logical operations — an Abstract Syntax Tree, or AST. This sounds complex, but it’s not too bad. The AST is language-agnostic — it bears no relation to the source or target language. This might make more sense when you see one below.
  • Compiler/Transpiler: turns the AST into the target language (JavaScript). This is a case of following the AST logic tree and working out how to change it into JS, depending on what’s happening before and after each node.

Compilers might also describe the grammar of their source language. This is typically written in a format called EBNF, and you can see my example here. I’m not using mine programmatically, but still found it a useful exercise, as I wasn’t entirely confident in how to best reason about the syntax. It also helped when writing the lexer, to determine how to break up the expressions it’s given. The grammar of lambdas is extremely simple, as you can see from my example, but you can imagine how complex it can be for proper programming languages.

Greece again

I’ll go over my first pass, which was to support the simplest test case I could think of: the ‘identity function’, λx.x, which should translate to x => x (a function that just returns whatever it’s given). At this point I couldn’t be bothered to copy and paste the λ character, so I’ve used l instead.

This was super easy, because I’d already worked out the basics of lambda syntax, and the grammar is very simple. Behold my lexer in its entirety:

(str) => str.split(/(?<=\.)/);

Unfortunately there’s no getting away from regex, so I bit the bullet and entered the hell that is positive and negative lookaheads and lookbehinds. This takes the string and returns an array that has split the string on '.' (but keeping it in the preceding element), so our identity function has become [’lx.’, 'x’].

I flip-flopped a lot on how I should actually break up the expression. Should the ‘apply’ . be its own lexeme? Should the lambda symbol l? Ultimately it came down to the simplest implementation that could be parsed, which is what we see here.

This is the most wordy section, partly because I wanted to throw an error if the parser identifies a lexeme as more than one type of thing (which it should never do).

We have the lexer:

And the model:

So to break it down and simplify, the steps are:

  1. Take the first lexeme (in our example, lx.)
  2. If it matches the regex for a lambda, add the lambda parser to our list of functions to execute later
  3. If it matches the regex for a bound variable, add that parser to the list
  4. If the list is empty or has more than one thing in it, something has gone wrong because each lexeme should only match exactly one case. Throw an error.
  5. Parse the lexeme, returning a new instance of Lambda or BoundVariable. For the new instance’s apply field, return the result of another call to parse with the next lexeme (this will be null if this was the last lexeme in the array).

As you’ll see from step 5, we’re using recursion to build up a tree of operations. The end result will be our Abstract Syntax Tree:

{
symbolType: 'lambda',
variable: 'x',
apply: {
symbolType: 'boundVariable',
variable: 'x',
apply: null
}
}

We’re really lucky with lambdas, because the tree will only ever have one branch — you only apply a function to one thing, another function. This removes an enormous amount of complexity. Because the identity function has only two elements (the lambda and the bound variable) this tree is only two deep. We know we’ve hit the end when the next apply field is null.

Now we need to take the AST and turn it into JavaScript.

Trapped in recursion

The transpiler is going to traverse the AST and build up a JS expression:

So it goes down the AST (recursively again), and if it sees a lambda it outputs a string with the associated variable + => and if it sees a bound variable, it adds brackets. These string expressions are all added together, and when it receives an apply node of null it returns an eval() of the built-up expression. The end result is x => (x), which is the identity function!

I am become master of compilation

As you can see from the full source, things got slightly more complex as I added more use cases. I put in a wrapper/controller so the tool could be called with command-line arguments and output a stringified result, which meant I could also add end-to-end tests. The transpiler can also handle unbound variables that are applied to the function, so lx.x 6will return 6, but of course that means adding support for unbound variables and spaces in the lexer and parser. As I added more test cases, the thing I noticed the most was that it only tended to be one or two of the modules that failed, because of the separation of concerns in the structure of the compiler — converting to and from an abstract syntax decouples the translation logic, which is why you do it in the first place.

I’d recommend anyone have a go at creating a compiler. As I’ve hopefully demonstrated, it’s not that hard to whip one up for at least trivial use cases. It’s a great way to learn an obscure grammar like me, or just if you’re interested in investigating the internals of how languages are translated. Maybe I’m naturally drawn more to the esoteric, but I actually enjoyed the process, so let me know if you’ve done anything similar!

The Startup

Get smarter at building your thing. Join The Startup’s +788K followers.

Sign up for Top 10 Stories

By The Startup

Get smarter at building your thing. Subscribe to receive The Startup's top 10 most read stories — delivered straight into your inbox, once a week. Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Stephen Leigh

Written by

Software Engineer with a taste for the obscure. I specialise in Kotlin and JS, but like most things I can use to try to make computers do what I want.

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +788K followers.

Stephen Leigh

Written by

Software Engineer with a taste for the obscure. I specialise in Kotlin and JS, but like most things I can use to try to make computers do what I want.

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +788K followers.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store