Minimum-Effort Transpiler: Writing a Passable Programming Language in Half-a-Day

Nate Rush
trymito
Published in
6 min readOct 5, 2020

--

TL;DR: We wrote the critical logic of Mito to support 10x more functionality than we expected while taking 10x less time. Our trick? We didn’t write any code.

Mito: Edit a spreadsheet, generate Python

About a month ago, we launched Mito.

Mito is a spreadsheet plugin for your JupyterLab notebook. You can use Mito to display and edit data in a spreadsheet-like interface.

That’s all well and good — but you can do that in Excel! Mito is special because it takes the edits you make and coverts them into corresponding Python code.

It’s just like recording a macro in Excel — except instead of recording in slimey-grimey VBA, you can record it in pretty-spitty Python!

This generated Python allows you to document what transformations you make to your data, manipulate larger datasets than current spreadsheets allow, and also rerun your analysis on new data. All with the ease of editing a spreadsheet!

We launched with not a ton of code — just a basic proof-of-concept that we could get a spreadsheet displayed in a JupyterLab notebook — and there was a ton of interest in the tool! We quickly reached the top spot on the Excel subreddit for the month.

And so we got to work. Our first step: allowing users to write spreadsheet formulas like they would in Excel.

For example, if cell A1 has 1 in it, a user should be able to write =SUM(A1, 100) in B1, and have the value 101 appear. More over, the formula =SUM(A1, 100) should translate to Python code that can be rerun.

The Mito Spreadsheet Language

To review, here are our goals:

  1. Users should to be able to write Excel-like spreadsheet functions that execute and display results.
  2. These same spreadsheet functions should be transformed into Python code, so users can reuse their analyses (as well as have an beginner-friendly way of writing Python!).
  3. Given that we’re a JupyterLab extension, this execution and compilation should all occur in Python as well.

So: how do we go from our as-of-yet-underspecified spreadsheet programming language to Python?

Writing a Transpiler

The solution to our woes, we quickly realized, was the fabeled transpiler. Before we explain our transpiler, a little background on related and relevant technology.

Compiler ⇒ Transpiler

Before understanding a transpiler, it’s helpful to understand it’s sister: the compiler. A complier is a tool that you’ve probably heard of; it’s a program that takes source code as input, and returns executable machine code as output.

Internally, it performs a couple steps to transform source code to executable machine code. Highly simplified, you can understand the compiler pipeline as:

  1. Take the source code as input, and use the parser to transform the program string into a “tree” that represents the program. This tree is known as an abstract syntax tree (AST).
  2. For example, the valid Python program 10 + 11 could be represented as the following abstract syntax tree (note that BinOp is the Python representation of any operation that has something on either side, e.g. +, -, ?, *, ...) :
A Python Abstract Syntax tree for the expression “10 + 11”
  1. After making sure this tree is all sorts of valid (e.g. all +'s have numbers on both sides of them), the compiler transforms this tree into an "intermediate representation" that is easier to manipulate in the final step.
  2. Finally, the compiler takes this easy to manipulate “intermediate representation” and transforms it into machine code that can actually be executed by a real, live computer.

As is noted above, a transpiler can be understood as the sister of the compiler.

Where a compiler takes source code and transforms it into machine code that a computer can execute, a transpiler takes source code from one programming language and transforms it into source code from another programming language.

In our case, we wanted to transform source code of our as-of-yet-unspecified spreadsheet programming language to valid Python.

Approach One: Baby’s First Transpiler

We started where we always do: completely inventing our own architecture with the little bits of background knowledge we’ve collected from partially-read Hacker News blog posts.

Our planned first approach:

  1. Write a parser that would convert our spreadsheet programming language to a parsed tree.
  2. Map each type of node in the tree to a Python term. E.g the function ADD would correspond to a + b , etc.
  3. Perform some tree-walking on this tree, and using the above specified map, output valid Python expressions code.

Before starting implementation, we decided we should read at least one thing about transpilers in the wild. Maybe we’d get some cool ideas…

Approach Two: Robust? We’ll go bust.

We luckily stumbled across this amazing How to Write a Transpiler instructional essay — and learned a whole lot real quick. Then we found this gem:

Now, there is a single biggest mistake we see in persons trying to implement a transpiler without experience in this: they try to generate directly the code of the target language from the AST of the original language.

Shit. Ok. There goes Baby’s First Transpiler. Time for a new compiler approach, heavily based on the instructions given in the above essay:

  1. Write a formal grammar for our still-goddamn-unspecified spreadsheet programming language.
  2. Use this formal grammar to generate a parser for our spreadsheet language, that could then parse any term to a parse tree.
  3. Transform the parse tree to an abstract syntax tree (still of our spreadsheet programming language) — to get rid of unnecessary language details.
  4. Convert the spreadsheet AST into an intermediate AST representation, for easier manipulation in later steps.
  5. Convert this intermediate AST into a Python AST.
  6. Finally, convert this Python AST into valid Python code.

If a lot of the above sounds a bit, um, esoteric and expensive to you — it did to us as well. And we’re also certainly missing some steps!

Approach Three: Give Up.

Deeply upset that we were going to have to spend a few weeks implementing a grammar for our language, debugging the grammar, writing a parse tree to AST converter, an AST to AST to AST converter, and finally a Python de-compiler, we went back to the drawing board.

All we were looking to do is to convert a single term in our this-is-harder-than-we-thought spreadsheet programming language to a term in Python.

Why’d it have to be so complicated?

Dramatic Simplifications

That’s when it hit us: given that our spreadsheet programming language isn’t specified yet, let’s just make it as easy to transpile as possible.

This idea is greatly supported by the following observation: the spreadsheet programming languages we’re all used to using anyways are super close to valid Python. For example, consider the following example spreadsheet function

B1 = LEFT(A1, 3)

This is just setting the cell B1 equal to the first three letters from the string in A1. But what would this look like in Python?

B1 = A1[0:3]

But… what if we just implemented a function LEFT?As input, it took a string an integer, and just sliced the string. Then, our transpilation would look like:

def LEFT(string, length):
return string[0:length]
B1 = LEFT(A1, 3)

To transpile this function, we didn’t have to change anything at all. Our lives just got a lot easier.

New Specification

With this simplification of keeping our spreadsheet language as close to Python as we possibly could, we had a new approach to the tranpiler:

  1. Do nothing.

Our Final Transpiler

At the end of the day, our transpiler is just a bit more complicated than what is specified above.

For one, to know what order to place spreadsheet evaluation statements in, we have to go though and find what other cells are referenced.

We also want good error reporting — so we check what functions you use and make sure we actually have them implemented in Python. If not, we let you know we don’t support that function yet!

It’s not the world’s most robust compiler, but it took about 3 hours to write, and it’s worked great in practice so far!

Lessons Learned

The first lesson: building a “full” transpiler ain’t the move for Mito, right now. It’s way more expensive than we can justify.

By returning to our goal, and critically thinking about what we were aiming for, we realized a small compromise could save us weeks of work.

Without this realization, there’s no way we would be launching next week. Thank god “the largest technical challenge” in the initial version of Mito only took an afternoon :)

Getting in Touch

If Mito sounds helpful to you, or you’re just interested in checking it out, sign up on our waitlist.

If you want to hear more about our technical work, follow us on Medium and on Twitter!

Bye, Mito!

--

--