Six Months With Julia: Parse-time Transpilation in 80 Lines or Less

Image for post
Image for post

In 2019, during the month of November, I picked up Julia while looking through the available languages on Exercism. I wrote an article talking about some of the neat features I got to play around with, and a fair number of folks in the Julia community saw it. I started at an internship position writing Julia code soon after, and have been happily plugging away since.

Today, to celebrate half a year of learning from the kind, whip-smart folks at my job and on the Julia Slack, I’m going to show you a fun little weekend project I’ve been working on. It lets you run Brainfuck code inside a Julia file without using a runtime interpreter. It does this by transpiling a Brainfuck program into a Julia function prior to compilation. It’s tiny, clocking in at less than 80 lines of code. It’s also extremely funny to show to people:

Image for post
Image for post

Let’s look at how it works — and see what we can do with it.

Table of Contents

  1. Understanding Brainfuck
  2. Julia Code is a Data Structure: Parse-time Code Generation
  3. Lexing and Parsing Brainfuck with Multiple Dispatch
  4. Putting it all Together
  5. Gaze Into the Abyss: Nesting Programs Inside Programs
  6. FAQ & Resources

1. Understanding Brainfuck

Image for post
Image for post
You are but a little baby. Watch this.

Brainfuck is an esoteric programming language that’s famous for its extreme minimalism and ha-ha-funny-swear name. Without getting into the nitty-gritty of handling size restrictions or data interpretations, here’s a quick rundown of how a program written in Brainfuck works:

  1. First, initialize an array of bytes, each with a value of zero, with a reference that points to the first element in that array. The program has input and output points for reading and writing bytes of information (usually, these are stdin and stdout).
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..., 0] 
^
|
Reference

2. Read the instructions from left to right. They are the following:

  • > and < move the reference to the right and left by one space.
  • + and - increment and decrement the referenced byte by one.
  • , reads a byte from the input and sets the referenced byte to its value.
  • . writes the referenced byte to the output.
  • [ is a conditional: if the referenced byte is zero, jump to the matching ] instruction and continue the program. Otherwise, continue to the next instruction as normal.
  • ] is also a conditional: if the referenced byte is not zero, jump to the matching [ instruction and continue the program. Otherwise, continue to the next instruction as normal.

3. That’s it.

Astute readers will notice that [ and ] combine to form a while loop. For example, a set of instructions like [>+] would roughly translate into the following code:

while memory[index] != 0
index += 1
memory[index] += 1
end

Despite its stark minimalism, Brainfuck is Turing-complete without restricted memory, and while it’s extremely hard to program in, writing interpreters for its eight instructions is a trivial task that makes for a fun “learn a new language” project after you get your Hello Worlds and your FizzBuzzes out of your system. Interpreters are already out there for Java, C++, Python, Haskell, and, yes, Julia.

So why didn’t I make an interpreter, instead of a transpiler? The answer is that many of Julia’s built-in language features allow us to arrive at some more elegant solutions. Let’s talk about metaprogramming.

2. Julia Code is a Data Structure: Parse-time Code Generation

Image for post
Image for post
This is your brain on macros.

Depending on who you ask, Julia either descends from Lisp or is secretly Lisp in disguise (shhhh, don’t tell anyone). One of the more notable ways in which Lisp gives a programmer control over their code is through macros, which allow Lisp programmers to generate code in any number of ways, from implementing list comprehension syntax to generating entirely new domain-specific languages. Metaprogramming, as the name suggests, is programming the program itself, and in Julia, we have access to tools that can give us some of the power that those funky Lisp wizards have been conjuring for decades.

At its core, a Brainfuck program is a single function with an input and an output, an allocated space for memory, and a set number of instructions for manipulating that memory. To translate a Brainfuck program into Julia code, we need to define such a function and replace the Brainfuck program with the transpiled function definition at parse time.

Here’s our first attempt:

Let’s break this down.

  1. We suffix bf with str to take advantage of Julia’s built-in support for non-standard string literals. Regular expressions, for example, can be built using the prefix r , allowing for concise representations like r"\s+" in pattern matching routines. Here, this will allow us to embed Brainfuck commands inside a string literal. For example: bf">>+>>+[-<]"
  2. We wrap our function using quote block syntax to ensure that we are returning the code itself, instead of the evaluated function. If our function was f(x) = 2 * x, for example, and we wanted to generate the expression f(3), this ensures that we’d return f(3) itself, rather than the evaluated value 6 .
  3. We define our function anonymously, so that it can be assigned as a variable in the context of something like foo = bf"->+++,<." , which might be called later as foo(; input=stdin) or something similar.
  4. Per the original distribution of the language, the memory size defaults to 30000 cells but is configurable for programs requiring larger spaces.
  5. We let the input be any kind of abstract string in addition to an IO source for user convenience. Internally, the string is converted to an IO source.
  6. esc is a way of removing macro hygiene. Since our variables are all defined within a local scope, there’s no need to worry about variable scoping conflicts, so we don’t scope our variables to the module we’re define our macro in. Don’t do this unless you’re absolutely sure it’ll work without unintended side effects.

After all this, we have a macro that generates a function. That function initializes the Brainfuck program’s memory space, and then returns nothing. We can generate the expressions for a Brainfuck program once we understand how to insert them into our function expression. This is easy, because Julia code is a data structure. Let’s dig into Expr.

Julia expressions have a head, which is a Symbol denoting the expression’s type, and args , a variable number of of other objects. Take this while loop:

while i != 0
i -= 1
end

Running dump on this expression lets us see how Julia sees this syntax.

julia> dump(:(while i != 0; i -= 1; end))
Expr
head: Symbol while
args: Array{Any}((2,))
1: Expr
head: Symbol call
args: Array{Any}((3,))
1: Symbol !=
2: Symbol i
3: Int64 0
2: Expr
head: Symbol block
args: Array{Any}((2,))
1: LineNumberNode
line: Int64 1
file: Symbol none
2: Expr
head: Symbol -=
args: Array{Any}((2,))
1: Symbol i
2: Int64 1

We see that at the top of the tree there’s a while symbol denoting the start of a loop construct. From there, the args are a conditional expression and a block expression, which is used to construct multi-line blocks of statements.

Let’s run dump on our quoted function. I’ve abridged the function definitions for space and added some hints for things to look for.

julia> dump(quote
function (; <parameters>)
<memory initialization>
<IO initialization>
end
end)
Expr
head: Symbol block
args: Array{Any}((2,))
1: LineNumberNode
line: Int64 2
file: Symbol none
2: Expr
head: Symbol function
args: Array{Any}((2,))
1: Expr
head: Symbol tuple
args: Array{Any}((1,))
1: Expr
head: Symbol parameters <-- <parameters>
args: Array{Any}((4,))
1: Expr
2: Expr
3: Expr <we want to insert expressions here>
4: Expr |
2: Expr |
head: Symbol block v
args: Array{Any}((4,)) <function body>
1: LineNumberNode
line: Int64 4
file: Symbol none
2: Expr <----- <memory initialization>
head: Symbol =
args: Array{Any}((2,))
1: Symbol memory
2: Expr
3: LineNumberNode
line: Int64 5
file: Symbol none
4: Expr <----- <IO initialization>
head: Symbol =
args: Array{Any}((2,))
1: Symbol characters
2: Expr

Look at that! We have an expression tree, and we know exactly where the function body is! It’s a bit ugly, but we can now access the function body at

last(last(brainfuck_function.args).args).args

Our code is just a data structure, and now, all we have to do is feed it expressions that are equivalent to the Brainfuck instructions we’ll be receiving in our program. For that, we’ll need to lex and parse our Brainfuck code into a series of expressions.

3. Lexing and Parsing Brainfuck with Multiple Dispatch

Image for post
Image for post
Rare image of a higher-dimensional dispatching to multiple realms

Now that we know the fundamental structure of the function our Brainfuck program will translate to, we will now need to actually translate those instructions. Thankfully, Julia’s type system and multiple dispatch have set us up for success here. Using dispatch on value types, we can directly map the instructions to individual instances of instruction. First, we convert the program string into a vector of Char structs, and then convert each of those Char values into a Symbol, which we can then dispatch on using lex(Val(symbol)). First, we define our instruction types, our characters, and our op codes.

Then we zip the two const lists and iterate through them, generating an instruction and a relevant lex function for each op code:

For example, for the second iteration in this loop, we have (:-, :Decrement), which generates the following code:

struct Decrement <: AbstractBFInstruction end
lex(::Val{:-}) = Decrement()

What if we run into a symbol that doesn’t match an instruction? For now, we assign it to a fallback type, which won’t inherit from AbstractBFInstruction. Finally, we dispatch lex on an iterable of Chars, which, in turn, dispatches lex element-wise on each character in the program:

Great! Now we have a list of instructions. The only thing left to do before we put this all back together is to parse the instructions into a collection of expressions that we can insert back into the function body. Here’s the entire parsing routine:

After initializing our expression and loop stacks, this function processes each instruction in two steps: parsing the next instruction and adding the result to our list of expressions.

  1. Parsing

Are we ending an existing loop? Let’s check if we actually started a loop — and if we haven’t, let’s throw an error, because that means our program isn’t valid. On the other hand, if we have started at least one loop, we can pop the most recently added loop expression from the loop stack and use that for step 2.

Otherwise, let’s parse our current instruction into an expression. Here, multiple dispatch saves the day again, by giving us a concise way to map each instruction to an expression:

2. Figuring out where to push the expression

Is our instruction the start of a loop? No problem — we can just push it onto the loop stack. Otherwise, let’s check if the loop stack is empty. That means our expression is at the top level of the function’s local scope, and we can just push it into the expression stack instead. But what if we’re in the middle of a loop? Well, remember when I dumped a while loop expression earlier on in the article? Let’s look at that again:

julia> dump(:(while i != 0; i -= 1; end))
Expr
head: Symbol while
args: Array{Any}((2,))
1: Expr
head: Symbol call
args: Array{Any}((3,))
1: Symbol !=
2: Symbol i
3: Int64 0 <we want to put our expressions here!>
2: Expr |
head: Symbol block v
args: Array{Any}((2,)) <---- <while loop body>
1: LineNumberNode
line: Int64 1
file: Symbol none
2: Expr
head: Symbol -=
args: Array{Any}((2,))
1: Symbol i
2: Int64 1

Hey hey! We’ve found the body of the while loop. Pushing our expression into the body of the most recent while expression in our loop stack will always place our expression into the correct scope, since any loops that have already ended have been popped off the stack and pushed into the expressions list by now. With that, we’ve processed all our instructions and are ready to send them to the function body.

4. Putting it all Together

Say hello to GalaxyBrain.jl:

Image for post
Image for post
My programs are too strong for you, traveler. You will want to find a program seller with weaker programs.

Let’s finalize our macro call and get some examples running.

Some key additions:

  1. Filter out any instructions that aren’t valid Brainfuck instructions.
  2. Locate the function body and append our instructions there.
  3. Add return nothing because I’m a sucker for the YASGuide.

To test that we’re returning a complete function definition, let’s make a program that uses each instruction at least once. Using @macroexpand, we can see that we’ve generated the correct function:

julia> @macroexpand bf">,<-[+>[<.]]" <not a meaningful program>
quote
function ($(Expr(:parameters, <formatted for medium>
:(input::Union{IO, AbstractString}=""),
:(output::IO=stdout),
:(index::Int=1),
:(memory_size::Int=30000))),)
memory = repeat([zero(UInt8)], memory_size)
characters = if input isa AbstractString
IOBuffer(input)
else
input
end
index += 1
memory[index] = if eof(characters)
memory[index]
else
first(read(characters, 1))
end
index -= 1
memory[index] -= one(UInt8)
while !(iszero(memory[index]))
memory[index] += one(UInt8)
index += 1
while !(iszero(memory[index]))
index -= 1
print(output, Char(memory[index]))
end
end
return nothing
end
end

And… that’s it! A fully embedded Brainfuck-to-Julia parse-time transpiler, in less than 80 lines. Let’s test it out.

We can declare buffer = IOBuffer() and pass it as an output point to our program, which will let us test if the program is giving the correct output. Let’s start with a “Hello World” program from the Wikipedia examples section.

Side note: since the memory array initializes every value to 0, Brainfuck programs will always skip over the first loop instruction, which lets us use enter any characters we’d like inside the loop. “Comment headers” are commonplace.

Now let’s give a program some input and see how it does.

Cool! But can we go deeper?

5. Gaze Into the Abyss: Nesting Programs Inside Programs

Image for post
Image for post
This is Olympic-level navel-gazing.

This program is not a runtime interpreter, and I made that decision early on because I believed that there were ways to translate Brainfuck that played to Julia’s strengths as a language and made for a unique final product. But what if we decided that we actually did want to write an interpreter for Brainfuck in Julia? We could rewrite our code pretty easily to do just that, but a few days ago, I had a thought:

Has someone written a Brainfuck interpreter in Brainfuck?

Yes — someone absolutely has. Here’s the final flourish: a Brainfuck interpreter, written in Brainfuck, transpiled to Julia, processing its input, which is itself a second Brainfuck program and its input.

I have nothing else to add. This is an extremely silly package.

6. FAQ & Resources

FAQ

What are the transpiler’s constraints?

I tried to strike a balance between staying faithful to the language’s original constraints (default memory size, reading input on a per-byte basis) while fleshing out some options for convenience and flexibility (configurable IO/sources, adjustable memory size, interpreting bytes as unsigned instead of signed, and so on). This means that program I/O is generally restricted to ASCII characters. Additionally, I do not implement any kind of memory bounds handling, opting to throw an error if the program heads out of bounds — it’s up to the function’s caller to allocate a large enough array to run their function. These were trade-offs that I felt comfortable making for the extent of this project.

What’s your favorite take on the interpreter?

I’m personally a huge fan of the Rust implementation, which takes advantage of Rust’s strengths as a language with respect to pattern matching.

Why didn’t you do any optimization strategies?

I didn’t need to — since the function is generated prior to compile time, Julia’s compiler is smart enough to do many of the optimizations in the article for me. The first two examples involve consolidating groups of similar instructions into a single instruction. Here’s Julia’s compiler doing exactly that:

julia> a = function (x); x += 1; x += 1; return x; end
#13 (generic function with 1 method)
julia> @code_llvm a(3); @ none:1 within `#13'
define i64 @"julia_#13_17607"(i64) {
top:
; ┌ @ int.jl:53 within `+'
%1 = add i64 %0, 2 <-- <one instruction, not two>
; └
ret i64 %1
}
julia> b = function (x); x[1] += 1; x[1] += 1; return x[1]; end
#19 (generic function with 1 method)
julia> @code_llvm b([1, 2]); @ none:1 within `#19'
define i64 @"julia_#19_17627"(%jl_value_t addrspace(10)* nonnull align 16 dereferenceable(40)) {
top:
; <miscellaneous indexing instructions>
oob: ; preds = %top
<bounds checking instructions>
idxend8: ; preds = %top
%8 = bitcast %jl_value_t addrspace(11)* %1 to i64 addrspace(13)* addrspace(11)*
%9 = load i64 addrspace(13)*, i64 addrspace(13)* addrspace(11)* %8, align 8
%10 = load i64, i64 addrspace(13)* %9, align 8
; └
; ┌ @ int.jl:53 within `+'
%11 = add i64 %10, 2 <-- <again, one instruction instead of two!>
; └
; ┌ @ array.jl:782 within `setindex!'
store i64 %11, i64 addrspace(13)* %9, align 8
; └
ret i64 %11
}

The optimization article is absolutely still an amazing read, and you should learn similar strategies for any program you write — I just happened to not need it and also had the tools to verify that.

Couldn’t you just return the expression with `lex` instead of `parse`? Why not do that instead?

That one was more of a stylistic choice. I found that my code read the clearest when I mapped each symbol to a specific instruction, and then that instruction became an expression. The intermediary representation makes the parse_instructions function read extremely straightforwardly, letting me use isa, which is great for helping your code read like prose.

Why?

It was there, no one else had done it, and it seemed like fun. I didn’t start programming for the joy of it — like many folks who did a CS undergrad, I did it in the hopes that I’d have a sustainable career. That’s a totally valid reason to enter the field, I think, but it can be helpful to find joy in programming along the way, even if that’s not why you started.

Image for post
Image for post

Don’t tempt me.

Resources

If you want to check out some more fun examples and see the full source, here’s GalaxyBrain.jl one more time.

Thanks for reading. Here’s where you can find me:

Github

LinkedIn

YouTube

Written by

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store