A gentle introduction to Compile-Time Computing — Part 1
I am finally sitting down writing about the aspect of software
engineering that is most important to me.
It is compile-time computing. Compile-time computing means that the code you later compile isn’t just what you wrote down. What you compile later is also code that is “written” by your code. Code generating code. Macros in a way, but not messy tag-on macros. In Lisp you even use the same language for both, enabling to re-use all your previous work for computation at both compile time and run time.
Now what’s the point? If done correctly, compile-time computing allows you to keep every single assumption that you make during programming in a single place.
Every time you leak one single assumption into more than one single place you create a problem when you later change your mind about that assumption. I think everybody here knows that a reasonable, complete specification of a program before it is written has never happened and never will. Changes are the nature of software development, and they start a long, long time before the program makes its first release. You need to be able to change your mind about assumptions and have a single central control knob for every single one of your old assumptions. You want to change one place and have it propagate through your entire system, from that one assumption you change in that one place. Only one human-written, human-maintained place for each assumption.
Small print first: Editing help from @dcooper8. Thank you so much. This was written when I still worked for Google on QPX, which is not the case anymore. This article is an important part of my writing on programmer’s focus and attention management, also see https://medium.com/@MartinCracauer/on-attention-focus-and-autism-in-the-tech-workplace-8246526fbbc0
Let’s first have a look at what other languages are doing:
Compile-time computing was a bit in fashion in the C++ community a couple years ago, when templates became powerful enough to do some limited amount of compile-time computing (which I will abbreviate as “CTC” from here on). It didn’t go very far due to the severe limitations of the compile-time programming language in C++ (a language that only has one data type — C++ types — and only one control construct — recursion — and it has no concept of collection types and repeatedly iterating over them). I ended up disappointed. I am still very glad it happened because it put CTC on the map in many people’s mind.
Other forms of CTC include the macros from C and C++’s preprocessor, various forms of m4/m5 usage, and the D language, all of which are also limited in both their expressive power and in their ability to interact with the main language. For my blog series I will use Common Lisp (“CL”), because it is the only language right now that has fully developed compile time computing. I am aware that this makes it a bit harder to have a large audience follow this. While CL has problem points even in my opinion, it is what we have for compile-time computing. Whatever CL’s shortcoming might be, having used CTC for decades I will not do without it. Nothing else will allow us to write code that we will still voluntarily maintain in 30+ years. Everything else is write-once and then there either is “the pain of maintaining The Legacy System” or a constant stream of re-writes with the associated second-system effect.
In Common Lisp, compile-time computing uses the same language for compile and run time. Every function that you write in CL you can invoke at runtime (like in a “normal” program) or at compile time (to generate or modify other code based on a concentrated specification).
Want to follow with code?
Now, for the reader who wants to follow examples, this requires a small amount of set-up work. You have to get a Common Lisp (CL) environment to try this out. A fairly minimal set of things will do:
1. You must have a code editor that shows parenthesis that match (when typing a closing parenthesis it will highlight the matching open one). You must have this. Everything else is unusable for Lisp. It should also have automatic indentation if you can get it.
2. you should have a Repl (commandline) that allows you to repeat and edit commands. You can either use a CL implementation that uses GNU readline for its input (such as CLISP), or you can use an implementation like SBCL inside an Emacs shell buffer
3. you can set up SLIME, which is a real IDE inside Emacs. My examples will be simple enough that you will not need this. On the other hand, Slime might help to play with debuggers and explore the language.
4. if you want to use any third-party libraries (not required for my examples, but good for personal experimentation), look into Quicklisp
I recommend SBCL inside Emacs. SBCL is a high-performance implementation with a compiler that allows you to generally write overhead-free code; that means the compiled code can run as fast as C code (exceptions and hacks apply, see my conference talks about how ITA uses SBCL to be fast). It also has a good set of compile-time warnings, including type mismatch warnings, and a thriving community which does active development and helps people. SBCL is available for Linux, the BSDs, on various processors including ARM running on the Raspberry Pi, OSX and even Windows. [Note from editor — You can get free pre-built install packages with Emacs, Slime, and Gendl (built on CCL but with plans to offer SBCL as well), for Linux, Mac, and Windows, from gendl.org]
Alternatives: various efforts have been made to have mechanisms related to Lisp macros in other languages — — usually Lisp derivatives that try to make Lisp look more accessible by dropping the parenthesis-based syntax. I haven’t found that they allow full compile time computing in the sense that I will outline here. It is always a “we implemented the most common uses of CL macros” and I find that that they usually drop what I need to do. I would be very thankful of readers would contribute implementations of what I will show here in Dylan or any of the JVM based Lisp derivatives. For now I will stick with CL. Don’t get me wrong. Even in my opinion CL can get tough to read once do you clever enough macros. But if you take into account what kind of expressive power those macros provide, and how much more maintainable they make things compared to spreading facts all over the source code, it looks reasonable. I would be all for having an even better language with full-featured compile time computing.
Lisp macros can also enhance readability significantly. One early example is that Lisp compile-time computing allows you to create any literal object you want — without needing compiler or parser support for it. If you want a literal object in your source code that is baked into your binary that is a hashtable pointing to a bunch of arrays with complex numbers — not a problem. Even if the language doesn’t have syntax for the literal you want yet, you can add it thanks to compile-time computing. You will have to get over (into)the parenthesis thing, though, and for that you need an editor with parenthesis matching display and ideally indentation support for Lisp.
Last but certainly not least, there is also Diff Efficiency. What the heck is “diff efficiency”? It means that if you look at a
git diff of your code versus a different state of your code, then you only see the actual functional change. Not a lot of boilerplate. Not a lot of copy and pasted slight variants of that functional change. If you use syntax highlighting then it will work correctly, because you don’t embed an alien language inside your primary language’s source code. One assumption, one place. One assumption changed, one place changed.
(side note: IDEs that “help” you do large boilerplate changes automatically but then do not re-collapse them when you want to see diffs deserve to be burned at the stake)
Compile-time computing allows you to keep every single assumption you make during programming in one place, greatly enhancing changeability.
Programmers actually want to do that — but irreducible repetition frustrates them greatly. Our tools suck.
Common Lisp is the language I currently use for full compile-time computing.
Inside the main source code, you will make little “custom” mini-languages to concentrate information in a compact space. Inside the main source code. Without going to a different syntax inside special markers. You will do so without needing an external program with a parser, much less by having to generate code from Python. Those mini-languages are just defining ways to express what you want in the source code that are most convenient for you to tell them. Macros then make them usable without having to modify the compiler or invoke any external tool. As a side effect, it greatly keeps down the amount of boilerplate your program requires. Boilerplate is bad for later changes since it blows up the size of diffs generated from changes and you can’t see “the point” about a given diff because it’s all full of interface adjustments that have no impact on the program’s flow from this change.
About myself: I got introduced to Common Lisp by Rainer Joswig @RainerJoswig (thank you so much!) at University of Hamburg’s AI department ca. 1992 (I studied Physics, but who’s counting?). At the time I had looked for the solution to all these problems in Smalltalk. That was a first-class introduction using a Symbolics Lisp Machine running Genera (of which I would get a couple later) and the famous IDE that Macintosh Common Lisp was at the time. I started using CL and FreeBSD at work, picked CMUCL and eventually did the work to open source CMU’s Common Lisp — another thank you to Scott Fahlman @ScottFahlman. CMUCL later developed into SBCL. In 2000–2001 I was hired by ITA software to make their QPX search engine run stably on the open source CMUCL, for the first commercial QPX customers. (also a heartfelt thank you to everybody at Orbitz from me. It was an honor working with you)
Fact is that the maintainability and ability-to-change-your-mind brought to us by Lisp was a key factor to the success of ITA. If you work in the airline pricing field you really need the ability to change your assumptions, on what all those funny fare rules (turing-complete rules no less) actually mean. ITA’s success led to being bought by Google, where I am still today (update — nope). Still fighting with the same codebase 15 years later and it’s still not clear who’s winning (no, really, I love QPX).