Containers Vs. Config Management
Do containers replace your need for configuration management, or can both co-exist? Should they?
Every development and operations team has more or less the same goals. Write code that is clean, maintainable, and performant, deploy code as often as possible without downtime, and give users a fast and enjoyable experience that scales to meet demand. Deploying continuously without downtime and scaling to meet demand is typically easier said than done, and the options for how to achieve these goals number in the hundreds if not more. They require building a hosting platform that is capable of a lot of things that system administrators used to do manually.
Platform is a slightly overused and loaded term these days. When you think about what a platform is, it’s really just the computer(s) your code runs on and how you got it there. Your platform might be a server in your closet and some shell scripts, a public PaaS provider that does it all for you, or one of the gazillion options AWS and other hosting providers give you.
Configuration Management Review
Up until about a year ago the gold standard, and theoretically the best way, to automate your server infrastructure and deployment workflow was to use a configuration management system. Some of the more popular options include:
With configuration management systems, you write code that describes how you want some component of your systems to be installed and configured, and when you execute the code on your server, it should end up in the desired state. The benefit of using one of these systems is that you can abstract away the differences between how various operating system handle functions like package management.
For example, you could write a bash script to install libxml2 on Ubuntu or Debian:
#!/bin/bashapt-get install -y libxml2
But what happens when you want to use this script on CentOS or Fedora? It’s not going to work right anymore because those distros use a different package manager.
Instead you can write a block of code with Chef, for example, that abstracts away the differences between distributions. You can execute this same Chef recipe and it will work anywhere that a libxml2 package exists.
package “libxml2” do
action :install
done
Config Management for Deployments
Since I am most familiar with Chef and have it fresh in my mind, I’m going to discuss it’s deploy resource, and some of the pitfalls I have run into when using it. Chef’s deploy resource is based on Capistrano which is regarded as a great option for doing deployment of code so what I write here is in no way meant to harp on it.
The deploy resource syntax in it’s most simple form will look like this in your Chef recipe code:
deploy 'private_repo' do
repo 'git@github.com:acctname/private-repo.git'
user 'ubuntu'
deploy_to '/tmp/private_code'
ssh_wrapper '/tmp/private_code/wrap-ssh4git.sh'
action :deploy
end
It’s not uncommon to deploy multiple projects to the same server during a single run of chef-client via a loop of some kind.
A simple example of that might look like the following
%w(project1 project2).each do |project|
deploy ‘#{project}’ do
repo 'git@github.com:acctname/#{project}-repo.git'
user 'ubuntu'
deploy_to '/var/www'
ssh_wrapper '/tmp/private_code/wrap-ssh4git.sh'
action :deploy
end
done
There are a few problems with how this ends up working.
- If the deployment of project1 fails for some reason, like a permissions issue on the git repo, it can cause the rest of the deployments to bomb out if you don’t build in error handling for that.
- If you have to build modules on deployment via bundler or npm or similar, it can be incredibly slow to run, taking minutes or longer in some cases. If the module build fails your chef run can bomb out unless you were smart enough to build error handling in for that case ahead of time.
- In general pulling from git is slow.
- Github goes down and then you can’t deploy at all, this is especially bad if you are autoscaling and deploying code during bootstrap of the new server.
There are ways around these issues, like building the modules ahead of time and storing them in an object store, like s3, and having your recipe pull them down and unpack them instead of building them on the fly. Some folks also mirror their Github repos to a local git server and pull from it. My point is, generally speaking, deploying with configuration management tools is a pain in the ass and error prone.
Container Review
Containers are the new kid on the block, but they are not a new technology by any means. Support for containers has existed in the Linux kernel since version 2.6.24 when cgroup support was added, and google has been using them for over a decade to power their massive global infrastructure. In the last two years, startups like Docker have made containers one of the most popular topics amongst developers and operations engineers alike, with large companies like RedHat, Amazon, Google, and IBM adding support in their hosting products. Docker was able to make such a big impact by taking the tools that were already present in Linux like cgroups and namespaces, and making them much more simple and accessible to the average joe.
Containers make cross platform portability of applications easier than ever before, and it solves the age old problem of development environment vs. production environment disparity by allowing for the same image that was built and tested on a developer workstation to be run in production. As a former operations engineer, I don’t know how many more times I could have heard “Something is wrong with production… this code works fine locally”. It created a situation where the developers writing the code and the ops team deploying it were at odds with each other instead of working together.
Containers for Deployments
Containers have some distinct advantages over configuration management systems, especially when it comes to deployments.
- All of the logic that used to live in your cookbooks/playbooks/manifests/etc now lives in a Dockerfile that resides directly in the repository for the application it is designed to build. This means that things are much more organized and the person managing the git repo can also modify and test the automation for the app.
- Containers/docker are a lot easier for developers to wrap their heads around for local development, and it eliminates the requirement of the whole team understanding how to utilize your config management system of choice glued together with Vagrant.
- All of the dependencies of the application are bundled with the container which means no need to build on the fly on every server during deployment. This results in much faster deployments and rollbacks.
- Not having to pull from git on every deployment eliminates the risk of a github outage keeping you from being able to deploy. Of course it’s still possible to run into this issue if you rely on DockerHub or another hosted image registry.
- Containers bring standardization which allows for systems like centralized logging, monitoring, and metrics to easily snap into place no matter what is running in the container. Overall this can have a huge impact on detecting issues between deployments, and the ability to easily roll out these types of monitoring solutions.
But there have to be some downsides, right? Well.. yeah.
- Dockerfiles do not give you the same level of control over configuration as your application transitions between environments, like dev, staging, and production. You can get into a situation where your Dockerfile has to call an external script that edits config files in the docker image on the fly, based on environment variables. You may even need to have different Dockerfile’s for each environment in certain cases. This usually is only a problem when you’re application is not following 12factor app best practices.
- A lot of negative things have been said about Docker’s security model, even spawning a competing container runtime in the process. As of fairly recently it’s possible to confirm checksums of individual layers within a container, but for a long time it was not possible to do, and someone who controlled an intermediate layer could poison downstream containers with little effort.
Can Config Management & Containers Play Nice?
Absolutely.
Most if not all of the popular configuration management systems now have hooks for docker integration. Chef has an integration that allows you to build docker images using Chef cookbooks and recipes, as well as manage how your containers are deployed to your servers. Ansible also has an integration that accomplishes similar objectives.
There are a few cases where I would say “yes, use both”:
- You’re already using one of these configuration management systems and you’re playing with idea of utilizing containers.
It will be easy to try out and you can get some immediate feedback to see if you want to continue on to more complex or full features hosting platforms. All that you will really be changing about your infrastructure is adding Docker as a wrapper around your applications, and you will continue to deploy them in the same way, to the same servers. That is unfortunate though, because using a clustered hosting plaform can significantly improve your resource utilization and can save you tons of money.
- You’re deploying things that can’t easily run inside a container.
There are some really ancient pieces of software out there that some of you need to run and manage. Some of them won’t play nice with containers, and the only way to automate the installation and management of them is with config management. This is also potentially an issue if you have serious compliance requirements that can not be met when using containers, or would require you to run them in priviledged mode.
Should You Use Both?
Ideally, no. With the release of full stack container management systems, you generally do not need to worry about using both to be able to fully automate your infrastructure in the year 2015. Of course, some container management systems actually require config management to automate their setup… I guess they didn’t think that through.
Generally speaking, if you are going to use both:
- Config management will only be used to install Docker, an orchestration system, configure PAM/SSH auth, and tune OS sysctl values. Basically anything not having to do with application deployment.
- Docker and your orchestration system of choice are used to run applications and specific software packages.
It’s best to settle on a single standard to automate your hosting environment. It makes it much easier for new team members to get up to speed, and enables more teams within the company to get on board and help out with changes and improvements to the entire infrastructure.
What do you think? Leave a comment and let me know!