Writing a Compiler for Brainf*ck
For my Software Systems class, we’re learning about C, operating systems, and all of those fun low-level things. For this sprint, my team and I decided to each pick a topic and do research and a mini-project. I chose to look into compilers, and ended up writing a small compiler for brainf*ck.
I wanted to learn more about compilation without having to deal with reading source for something incredibly complicated like gcc. To do this, I decided to write my own compiler for one of the most simple languages out there — brainf*ck!
If you don’t know, brainf*ck is an esoteric programming language with only 8 commands. Its basic style is to have a single instruction pointer that can move across a “tape” of memory blocks and increment or decrement each. It also has loops and i/o. This program in brainf*ck prints “Hello World!”:
So I wrote subfusc. Named as such because because it’s a word that contains the letters “b”, “f”, “c” in order, and because it means “dull, gloomy, and devoid of appeal”, and while this compiler was A LOT of fun to write and figure out, it is devoid of appeal for use in any real situation.
How I Wrote It
I was able to write it without knowing assembly by simply translating brainf*ck commands into C code. Then, I’d compile the C code with gcc -S and look at the assembly output, and copy the relevant lines into my subfusc code.
Brainf*ck works by moving a cursor along a tape of memory. So in C, we can have an empty array to be our tape. A pointer to that array points to the memory address of  in the tape. And incrementing/decrementing would just be changing the array values, and moving the cursor is just incrementing/decrementing the pointer value.
The following example shows how I started translating some of the simpler brainf*ck commands into their corresponding C code:
char tape; // the tape
char *i = tape; // pointer to the array - the cursor
(*i)++; // increment the value - in bf, +
(*i)--; // decrement the value - in bf, -
i++; // increment the pointer - in bf, >
i--; // decrement the pointer - in bf, <
The development of subfusc was heavily influenced by the naegleria compiler, which was written in PHP. I used the same process as illustrated in the linked blog post, since I don’t actually know how to write assembly code, but went through the process on my own to understand the C translation of each brainf*ck command.
First, compile the C program into an executable
$ gcc subfusc.c -o subfusc
Then, run the program with the brainf*ck code as arguments. The argument structure of the program is as follows:
$ ./subfusc [brainf*ck source code] [target assembly file] [target executable file]
If you don’t specify a target executable file, subfusc will create the assembly code, but you will have to assemble it yourself by running:
$ gcc [assembly file] -o [executable file name]
For example, the following lines will compile, assemble, and run the Hello World brainf*ck program (assuming you have compiled the subfusc source):
$ ./subfusc examples/hello.bf hello.s hello
subfusc does absolutely no optimization — it goes through each command at face level without regard to what comes before or after.
subfusc does implement minimal syntax error checking — it takes two passes through the program, and on the first checks solely for syntax errors. Currently, the only syntax error it will detect are mismatched loop brackets. The following are all examples of mismatched loops it will catch:
and so on. If subfusc finds a syntax error, it will halt compilation and print an error message to the screen. In some cases, it will be able print the character location of the mismatching error.
Another limitation is that subfusc is a compiler in the most pure sense of the word — it compiles source code to assembly code, but does not perform the extra steps such as assembling, which programs like gcc typically include.
Link to the GitHub repo, which contains the source code as well as some example brainf*ck programs to compile and run.
Written October 4, 2015