Configuration files suck.


Just use a programming language.


Every configuration file introduces a morass of unknown syntax, unknown semantics, poor debuggability, poor documentation, poor maintainability, insufficient abstraction, and insufficient generalization.

But there are multiple alternatives to the configuration file, and they might work better. Compilation options move configuration from runtime to compile time. ‘Librification’ is a more general version of compilation flags. Runtime ‘configuration programs’ are a more general version of run-time configuration files.


What is a configuration file? Conceptually, it’s an argument to a generalized service, which yields a specific program useful to you. For example:

apache : Configuration -> HttpServer

Apache isn’t an HTTP server; it’s a function that yields HTTP servers. Let’s say we own a website that sells magic beans. The configuration file might sit at /etc/apache2/magicbeans.conf and have content like

Listen 80
NameVirtualHost *:80

<VirtualHost *:80>
DocumentRoot /var/www/magicbeans
ServerName www.magicbeans.com

</VirtualHost>

When we want to launch our site, we would then run something like apachectl -f /etc/apache2/magicbeans.conf. Apache would start, read the configuration file, see that it’s meant to serve files from /var/www/magicbeans when it receives requests for resources at http://www.magicbeans.com/.


The Apache project chose to represent the Configuration argument as a set of files, the apache function is represented as a start-up phase which is given the path to those files to parse and compile, and the yielded HttpServer is represented as the main phase of the program, which the start-up phase transitions to.

But in computing, we have many ways to represent functions and arguments, and this ‘start-up configuration file’ scheme is just one of them. I will outline the alternatives and suggest that they can be simpler and more powerful.

1. Compile-time configuration


Apache also has many compile-time options. These are set using a script called ./configure. These options overlap with the options that can be set using configuration files at run-time.

In this scheme, the Configuration is represented as arguments to a ./configure command, the function is the make command, and the yielded HttpServer is the compiled binary.

Theoretically, it should be possible to move all apache .conf files to the compile phase, so they are passed to the build system, which yields a binary which starts up immediately in the ‘main phase’ as an HTTP server. Rather than running apachectl -f /etc/apache2/magicbeans.conf at run-time, we would run ./configure -f /etc/apache2/magicbeans.conf -o magicbeans.exe && make && ./magicbeans.

One advantage of moving from run-time to compile-time configuration is simplicity: where the distinction between multiple configuration phases was a bit arbitrary, you now have just a single configuration phase.

Another advantage is that the run-time binary does not have to carry around baggage for its configuration. Where the apache binary on every application server used to carry around logic for parsing configuration files, compiling configuration files, loading and unloading modules, etc., you now just have a single magicbeans binary which does exactly what you want and no more.

This also means that configuring your application server is easier. Where you once had to ensure the presence of all your necessary .conf files on every application server, and that you restart Apache every time these change, you only have to ensure that the magicbeans binary is present.

This shift from run-time to compile-time configuration also has a benefit similar to static typing. It’s better to catch errors in your configuration files at compile time than wait until you’re deploying.

We could go further. If you squint, your static files under /var/www are more configuration files. The file /var/www/index.htm is a configuration rule that says ‘when you get a request for /index.htm or for /, serve this content’. This rule could be bundled into your binary at compile time, too. With all these configuration options, the question is how ‘static’ you want things to be — bundling your static content into the binary makes it more static, in the sense that you can no longer hot-swap that file without restarting the application.

So why does Apache lean towards run-time configuration files? I suspect it’s historical. In times past, our application servers were multi-user UNIX systems running many different websites, where separation of users’ configuration was critical, and ‘graceful’ restarting of Apache was frequent. Nowadays, we run servers dedicated to single applications, where these concerns are less important.

2. Librification


Apache is better viewed not as a program but as a library. The file apache2.conf is its API.

This view reveals another approach to configuration. Instead of Apache being a program which reads your configuration file and then transmutes into your end-user program, Apache could be a function that you call in your end-user program.

What I am suggesting is that you would instead write /etc/mywebserver.c as

#include <apache2.h>
int main(void)
{
Apache2Config* conf = apache2_defaults();
 Apache2VHost* vhost1 = apache2_vhost();
vhost1->address = “*”;
vhost1->port = “80";
vhost1->document_root = “/www/example1";
vhost1->server_name = “www.example.com";
apache2_add_vhost(conf, vhost1);
 return apache2(conf);
}

and then use gcc to compile it, link it against Apache, and run it. Or rather, your /etc/init.d/apache2 compiles the configuration when you start the service.

Substitute for Apache whatever service you wish to use, and substitute for C whatever language is used to implement that service. If using a Haskell web server library, you could instead write:

module MyWebServer where
import HttpServer
main = http [defaultVHost { address = “*”, port = “80", documentRoot = “/www/example1" }]

I’m talking about ‘compiled languages’, but of course the same applies to ‘interpreted languages’ (which just bundle the ‘compile’ step into the execution phase).

3. Configuration program



3.1. Prolog

Gerrit Code Review takes a really interesting approach to configuration. After a change request is submitted for review, Gerrit needs to know whether that change is allowed to be merged. Does the change need to be reviewed? Does the change require tests to pass? How many +1s does the change require? Can someone review+ their own change? Does a -1 cancel out a +1, or count as a veto? Do the rules differ depending on what branch you are merging to? Do some users have privileged rights?

One way they could have allowed users to configure this is to try to split up all these questions into orthogonal parameters which administrators can provide values for. For example:

[branch=master]
numberOfRequiredPositiveReviews=2
negativeReviewIsVeto=true
requiresTestSuitePass=true

… and so on. They didn’t take this approach. Instead, they took a far simpler and more powerful approach. User-provided configuration is provided as a Prolog file, rules.pl. In this file, the user effectively defines a function from ‘facts about the change’ to a boolean that says whether it is submittable. Those facts, such as ‘this change is authored by James’ or ‘Richard says +1’, are provided by Gerrit to the Prolog program whenever it needs to know whether a change is submittable.

So far, that’s the same as if they were to have taken the ‘configuration program’ approach, with the user providing a callback in their program. But since this is Prolog,

Advantages

  • You already know the syntax. You don’t have to learn the syntax rules for every new program you have to configure. And how many projects bother giving you a syntax definition for their config files? Next to none. There isn’t even a syntax definition for Apache config files. If you’re lucky, the program uses this month’s structured data syntax: XML, JSON, YAML, TOML, … all of which are obviously isomorphic.
  • No programs bloated with parsers. That little program that uses XML for its configuration file probably spends 99% of its program size on a bundled XML parser. The UNIX philosophy is that each program should do one thing, and do it well. The job of Apache is to parse HTTP requests, not configuration files. The job of GCC is to parse
  • More obvious semantics. Or at least, more easily defined, and more guessable. You don’t have to use trial and error to figure out what the scope of that variable is. You know the scoping rules for the programming language you’re using.
  • Debuggability. If you want to find out if your addition to that configuration is being read by the program, just set a breakpoint.
  • Tooling. It’s a lot easier to use a standard documentation generator for your API than to write custom documentation for your ad hoc configuration file format.
  • Manageable configuration. The latest fashion in ‘devops’ is to use Puppet with fragile search-and-replace rules and templates to ensure the configuration files on your application servers are correct. Why? Just use a real programming language to construct the configuration data, compile the exact program you want, and put it on the application server.
  • Compositionality. Want to run two different MySQL instances? No problem: write two programs and run them. By the way, those files under /etc/mysql are global variables. I won’t repeat the sermons on global variables that you’ve heard already.
  • Abstraction. The power of a general-purpose programming language. If you want to define 1,000 near-identical virtual hosts, use a loop. If you want the program to behave differently on Debian, use an if/else. The thing is, all configuration file languages suffer from expressiveness creep through the lifetime of the project until they become a terrible, ad hoc, inconsistent programming language of their own. It might start off as a key-value syntax. Then someone wants to use the same value for several values, so variables are introduced. Then someone wants to change a variable, so a notion of assignment is introduced. Then someone wants to split the configuration file over several files, so a notion of importation is introduced. Then someone notices that variables defined in imported files clobber those in the parent file, so some kind of variable scope and shadowing is introduced. Then someone wants a value to depend on which version of the software is being run, so a branching construct is introduced, and a boolean expression language over versions. Then people want to branch based on more general properties of the environment, so the expression language is expanded.
  • Generalized APIs. This is the most important advantage in my opinion, since this one really can’t be captured by a configuration file at all. Instead of giving me a million-and-one configuration options for URL rewriting with the limited power of regular expressions, just let me provide a function of type String -> String and be done with it. Instead of implementing lots of cute ways for me to define a predicate over filepaths, just let me provide a function of type String -> Bool, and be done with it. Configuration files are chock-a-block with configuration options which should just be replaced with higher-order functions.

Programs that have seen the light

  • Vagrant. The Vagrantfile used to configure it is just Ruby.
  • XMonad. The xmonad.hs file is just Haskell.
  • Shake. The build system is a Haskell library, not a program.
  • … know of more? Let me know!



Logback.xml config — conditionals in XML, ugh

JSP???


http://ofb.net/~wnoise/build-system/