Configuration Management and the Cloud — the Operations One-Two Punch

If I were George, I would have automated this

Possibly the best piece of career advice I ever received was given to me when I was just starting out at QAD. I was in the process of taking over the fulfillment systems, which was a largely manual process of creating custom-built software CDs for customers. The architect who had previously been in charge of maintaining those systems told me: “Do everything you can to automate yourself out of a job.” Being young and almost entirely free of expenses and life responsibilities, I saw no reason not to follow that advice and applied that principle to every job I had afterwards. Over the years, that goal has been made much easier by the introduction of configuration management tools and AWS. And, amazingly, I still have a job.

Configuration Management

Configuration management gives us the ability to automate the installation and configuration of system and software packages. Ideally, it’s used to provision a system from the minimal configuration required to run the configuration management utility. So, for example, when I was responsible for configuration management at NBCUniversal, we created a base VM image that just had the patched OS, Puppet’s dependencies, and Puppet itself. The provisioning process was simply to spin up the VM, add its hostname to Puppet with the appropriate configuration, and then run Puppet on the host. Once we wrote some scripts to automatically add and remove hosts from the Puppet configuration, the whole thing worked seamlessly, and we could provision a server in about ten minutes (with the majority of that time spent having Puppet do its work unattended).

Configuration management is an incredibly powerful tool for operations teams; done correctly, it can completely eliminate the antiquated system of configuring servers by hand, thus eliminating the possibility of user error and enabling engineers to spend time on other activities. However, the single-most important (and less obvious) benefit of configuration management is that it turns your processes into code, and code management is a established practice that’s been refined over decades. This enables you to implement peer reviews, source control, validation, and a slew of other quality control and efficiency measures that you’re probably already taking.

The single-most important benefit of configuration management is that it turns your processes into code

[Side note: there is a lot of debate about where the line of responsibility should be drawn between “baked in” the image and provisioned via configuration management. For me, the answer is “it depends”, and the details of that could be a whole post in and of themselves.]

Configuration management is great for provisioning and maintaining operating systems and application platforms, but of course operations folks do a lot more than that; they’re also responsible for provisioning and maintaining the infrastructure itself.

AWS CloudFormation et al.

For companies using AWS, one of the most powerful and often ignored features is the ability to represent your infrastructure as code. CloudFormation (as well as third party products like Terraform and Fugue) allows operations teams to represent entire AWS infrastructures as code. Besides extending the previously described “as code” benefits all the way to the infrastructure level, it opens up a new set of possibilites:

  • Automated testing — AWS infrastructures can go through the same testing process as application code. This includes validation, unit tests, and so on.
  • On-demand infrastructure for code testing — rather than having dedicated servers of VMs for pre-release testing, you can spin up those environments when you need them (and power them down when you don’t).
  • Audit trail — with source control and CloudFormation, you can know exactly what has been done to your infrastructure, and by whom. Implementing CloudTrail can also give you the ability to capture and log “manual” interventions. For regulated industries, this is an enormous benefit.
  • Self-service infrastructure — in the spirit of “automate yourself out of a job”, companies can implement Service Catalog for their developers. With Service Catalog, developers can create any number of pre-configured environments for testing — using tagging and scripting, these environments can be charged back to a specific department or project, and even programmed to “self-destruct” after a predetermined amount of time.

For a lot of companies, this can seem like an unattainable goal, but it’s really not — the most important part is to just get started, and there are real rewards for each step. I always enjoy discussions on this topic, so please also post your thoughts in the comments, or connect with me if you want to take it offline.