Monday, 15 August 2016

In an attempt to distract myself from what was really stopping me from sleeping last night, I started thinking about configuration management systems again. I’ve always had a bit of an interest in these things, ever since I was introduced to LCFG on the Tardis Project as a student, back in the 90s. In those days, I even had a wee rack of SparcStation 1s and 2s (the pizza box-shaped ones) sitting in my bedroom, all managed with LCFG!

I’ve just started reading up on Ansible, out of curiosity, to see what’s new in the world of configuration management. Chances are I’ll go take a look at SaltStack, too, see what it’s all about. (Can you recommend any others?) It’s really good to see that there’s some innovation in this field, in addition to the incumbents of CFEngine, Puppet and Chef. (I still think of Chef as the newcomer, but I suppose it’s getting on for a decade old now!)

But I still have this nagging feeling that they’re all doing it wrong, and there must be a better way. I’ve written about this extensively in the past (though I suspect I’ve never shared most of those notes with anyone else, so I shall have to try and dig them out of my Day One archive!).

One of the key aspects of the old-skool configuration management systems is that they’re declarative. You are essentially describing what you want the target system to be, and it’s the agent’s job to figure out how to get from the current state to the desired state. Roughly speaking, I can express that I want, for example:

  • A pair of web server nodes, listening on the well-known port 80, serving a particular set of static files, and proxying to an application at the backend;
  • 3 nodes with the backend application, listening on some port that’s been mutually agreed with the front-end web service; and
  • A database node, listening on some port that’s been agreed with the application service hosts, with a safely stored set of persistent data.

A good configuration management setup should allow me to express those high level concepts, then allow me to decompose them into the building blocks needed to achieve them (which inevitably mostly comes down to package management, configuration file tweaking, and daemon-herding, with a few extras thrown in, too).

And that’s one of the places where I’ve always felt existing configuration management systems fall short: they’re fundamentally focused on the individual node, rather than the service as a whole. Almost all of the business requirements we’re trying to express are at the service level, and will require more than a single node to provide that service. I’ve done quite a bit of work with managing the relationships between nodes cooperating to provide a service using exported resources in Puppet, but it’s always been so fiddly (not to mention slightly fragile in production!) that the high level concepts get lost in the details.

So there’s that. From my brief skim of Ansible, it seems to be a departure from the declarative ways of the incumbents. Its approach feels more like it’s automating documented set of commands for bringing up a new server, or changing an existing services. Essentially taking that documentation you’ve got for installing a new node, and ‘scripting’ it. We’ve all done this in the past — having a shell script which takes a brand new machine and configures it to our liking. It’s the first step on the path to DevOps enlightenment, and there are some definite advantages to its relative simplicity.

Ansible gives you a bit more than just a shell script, so you’ve got a slightly ‘nicer’ (depending on your opinion of templated YAML!) language to express the steps in. What is it, though, about configuration management systems, that an introductory chapter needs to get caught up on the intricacies of lexical scoping for variables?

Another common pattern in configuration management is the separation of what from how. This is a good pattern, in that I can express what I want to achieve (e.g. that I want to have the Apache web service running) and leave the details of how that happens to an underlying provider (e.g. poking at the init system to make sure the httpd process is enabled on boot and is currently running). It’s what gives these systems their ability to deal with heterogeneous systems (e.g. managing a mixed cluster of Linux, Solaris and Windows boxen).

But with heterogeneity comes lowest common denominators. I was never able to take full advantage of the Solaris Service Management Framework (its beautiful init system — Solaris got a lot of things right, even if it was a pain in the ass for other aspects!) because I was hobbled by the lowest common denominator that was the Linux init system at the time. In providing a standard interface for the what you can only expose the features that can be implemented across all the providers. Chances are the same is true, now, where systemd has much richer functionality, but we’re still straightjacketed by the need to support older init systems.

But I don’t need heterogeneity. I deploy all my systems on the same platform (usually the most recent Ubuntu LTS release). Most people standardise on a single Linux distribution for their production clusters. The community itself still needs support for heterogeneous systems, though, because we can’t all agree on which distribution that is. (And some poor sods still have to deploy to other platforms, too.) I suspect that’s hampering us, and I’d really like to see an opinionated framework that fully embraces (much as I hate to admit it) systemd, fully exposing the richness of expression I’m beginning to gather it supports.

I still think it’s important to separate the what from the how — that’s just good sense for writing maintainable code — but a system that mandates (say, for example) apt for package management (including configuration files), and systemd for service lifecycle management could make the high level expression that much cleaner.

(I suspect at this point I should be looking at what Canonical are up to, and there’s some wizardly named project that takes this sort of approach.)

Then there’s what I was actually trying to contemplate last night — in an attempt to distract myself from dwelling on things I cannot change! — which was wondering if the pattern we use for running database migrations is, or could be, applied to configuration management:

But I’ll save that thought for tomorrow morning. :)