Forging A Path Away From Containers and Config Tools

Michael DeHaan
Nov 20 · 9 min read

In light of Docker just giving up and rolling over, I think it was as a good time as any to talk about some assumptions we have about software deployment in the IT operations space.

In short, I believe we don’t need containers or configuration management systems at all…and I have an idea for a tool I’d like to see if anyone is interested in. It would be incredibly simple…but let’s start with some background to show why I believe this is true — particularly now.

Immutability

We believe in immutableness not because images are better or faster. By contrast, building images requires waiting, and the process of constructing a new revision of an image reduces “flow”. For a developer, getting a new image right via a complex automation system can actually take all day.

We instead believe in immutability because of a historical problem — scripts written to upgrade systems could have errors when the systems being upgraded are in different states. If we simply start with clean systems every time, the need for images goes away. This is why, even back in 2006 or so, the best practice for software deployment was re-imaging systems and PXE’ing them to a base image before installing software. Not everyone did it, but it was exceptionally reliable. Some of the best shops would reimage their entire clusters every night.

Why? This is because starting with clean systems is still immutable. However, baking images for every type of software is not required to get there.

Declarativeness / Idempotence

The whole idea about declarative configuration is about configuring systems that are not treated immutably. Once we treat systems immutably, the scripts that install them do not need to assume anything about the state of those systems.

Scripts Are Bad?

We only believe scripts are bad because they are unreliable when used for non-immutable management, or that they do not have a good way to share parts of them.

Adding a minimal template layer on top of a script based system makes it much more powerful.

Docker is *missing* this kind of system in Docker files, and instead stacks images on top of images, which is awkward and inefficient.

Scripts are actually good, because instead of having to learn an Ansible module and a Unix command, you can only learn the Unix command, and it will work that way forever.

Application / Container Images

We believe in baking application images because they are a means to ensure immutability, but if we instead start with a fully clean system, the need for specific images for every version of an application is reduced. Images have, by contrast, encouraged us to track dependencies with less rigor, and often we do not have historical copies of all of our dependencies neccessary to produce prior versions of an application.

We also, mistakenly, believe that images are required for A/B deployments. This is only partially correct — they are required for efficient autoscaling.

(Below, I’m about to propose an imageless deployment system, but this is *NOT* to say the script system used could not be also used to bake images and provide a bit more structure. It’s just that it isn’t necessary)

Configuration Systems

We have historically believed in configuration management systems because we believe bash scripts are unreliable — but this is only true when the states of the configured systems are unknown. Further, in an immutable world, whether image based or otherwise, the needs of a configuration management system are greatly reduced — often witnessed by the modern purpose of most systems being used to bootstrap clouds or networking hardware, rather than definining application templates.

We also believe in configuration management systems because bash templating is a little awkward, and often a variable system is required to manage commonality between applications

Cloud Abstraction

We believe in cloud abstraction because we believe the pricing models of clouds may change from under us, or that a given cloud may offer a feature advantage.

However, in using any cloud abstraction, we cut ourselves off from the specific advantages offered by that particular cloud, while also now having to maintain or reason with a cloud abstraction layer.

This may be a cloud DSL tool (terraform), or a fully featured cloud provider (kubernetes, etc). Both involve dragging feet to avoid embracing a platform, but amplify human workloads in the process.

Containers

We believe in containers because images were slow, but also because we believe software had utilization problems.

However, in web architectures, utilization quickly proves to be a myth — all applications have more than one VM’s worth of computers as they scale out horizontally to meet user load.

By accepting containers and gone a bit too far into microservices, we have made debugging complex, and in fact, due to single-process constraints, have made our computer systems less flexible. (Debugging by logging and murder-mystery outages, in particular).

Containers were an outgrowth of an attempt at getting to PaaS, but PaaS always traded developer flexibility and productivity — without really delivering.

Simply put, we don’t need them.

In Contrast, The Unix Philosophy

The Unix philosophy values small tools that work together, that do not have to change frequently, and that once you learn them, keep stable interfaces sometimes for thirty or forty years.

The Unix philosophy does not value consumption of the latest shiny object, but involves making something work, getting out of your way, and allowing you more free time to work on the application software that is unique to your own particular mission.

By contrast, tools pushed by the popular devops markets these days are large, non-pluggable beasts that require more and more of your attention, not less, and are a major barrier to entry to new employees.

We have lost an understanding of how things work. In avoiding cloud lockin we have ignored the best features of our cloud providers and instead locked us in to abstraction layers that are less useful than the clouds themselves.

In short, the character of the operations environment has rapidly lost many of the values that had Unix so great.

Minimalism is great for a reason — less moving parts is more reliable, more easy to understand, easier to learn, and won’t wake you up in the middle of the night.

These were the values I was going for in 2012 when making Ansible. I just don’t think I went far enough, because I didn’t know enough at the time. By constrast, all this stuff is growing instead of shrinking — and maybe that’s backwards.

Stepping Back

I created Ansible in 2012 out of a personal quest to improve upon Puppet, which was quickly growing to be the most popular config tool at the time. As time has gone on, I have both won the configuration management war, perhaps post humously, and seen configuration management replaced by

What Is Needed

Nobody wants to keep up with a configuration management tool, especially learn another new one (which is why creating opsmop failed), learn another YAML dialect, or maintain a full active cloud abstraction layer running on top of their cloud itself.

We do some of these things because they are fashionable resume experience. However, what if we could reclaim the time we spend with these tools and instead invest this time back into our production software.

When I was creating Ansible, I said, this tool should never be someone’s day job — but realistically, it does become people’s day job. The solution of an automation tool should not be to acquire user-hours, but to acquire users, to give them their hours back.

People know scripts, and they don’t want more moving infrastructure parts to keep running (kubernetes, etc)

Everyone should be able to describe their application configurations entirely in a bash script, using, where needed, basic templates. And they should be able to deploy easily without using a container cloud.

We need to be able to be define “services”, which are mappings between an install script, an ALB/ELB (or other cloud equivalent), and the size of the service.

Then, all that is really needed to do when initiating an upgrade is attaching a new ASG with a number of instances.

These can be configured off any base image, and still configured via classic SSH.

Once the systems are configured, all we have to do is flip the ELB from one ASG to another.

Parallel SSH here is simple, and can be managed with the classic ssh-agent, using GNU tools.

ELB switchover support would need to be modular, and could easily use boto via python for AWS.

Ultimately, we’re talking about a simple program that is a few *thousand* lines of code to start, and absolutely *zero* modules to maintain, nor any “code” to speak of.

A call at the end of a Jenkins build might look like this:

deploy_thing.py — load-balancer=nameOfElb — size=10 — base-image=imageName — push-config-dir=./config/ — script=/config/go.sh

What this would do is simple:

It creates a new ASG with 10 nodes, rsync’s the local config/ dir to them as /opt/config, and then runs /opt/config/go.sh

There would probably not even be any config files (one might be available to avoid needing any parameters, like a deploy_thing.toml), though it would likely use your local AWS config environment.

load_balancer=nameOfElb

size=10

base_image=foo

push_config=./config

script=./config/go.sh

(This is a very early draft)

Gains

Establishment of the ELB itself would be left up to CloudFormation/Terraform, but look at what you have:

  • Deployment without specific images for every service
  • Deployment without a config management system
  • Deployment without operating a container cloud

This should not be controversial — this is how best-practices deployments were done in physical datacenters, before clouds were common place, with physical load balancers.

In fact, there is little to say this couldn’t also support a “metal cloud” deployment system for physical data centers as well.

What about stateful systems? Management of mutable systems is a confusing part of modern IT sometimes. In this case, the deploy tool could just take a “ — in-place” flag and wouldn’t attempt to spin up new nodes, instead finding what nodes to talk to by asking the ASG attached to the ELB.

The only downside to this approach is it DOES require you write your applications to install software from a local mirror — whether an apt mirror or something like artifactory, this is a good practice anyway, because you MUST have the ability to deploy past versions of the application in case you ever need to revert to an earlier version. Today, we are very apt to include dependencies that are nothing more than pointers to SHAs on github. In deploying this way, we lose the ability to roll back, but we ALSO tend to slam public services when we deploy. We must not do this. Really, you shouldn’t be doing this when your CI/CD system builds images either — you *SHOULD* have mirrors of such things.

Additional features should allow quickly listing the machines in the ELB, and also quickly running commands against all of them, and collecting a tree of logfiles from those remote machines.

This is more of a “I need a user first” kind of thing, but ultimately, I feel we have a mandate to simplify IT operations again. There’s been too much pressure to absorb vendor tooling endlessly, and we’ve hit a local maxima of the worst sort.

For those that *HAVE* been around 15–20 years ago, we can take the best lessons from today and combine them with the best lessons of yesterday, and make an even better reset than ansible achieved in 2012.

This would be ideal for all sizes of companies, from small startups who want to minimize the complexity of their tech stack down to the bare minimum, to larger IT enterprises who want to invest more in their own technology and stop chasing vendor shiny objects.

In respect for the Unix philosophy, it doesn’t address definition of cloud topology, and in the strong belief that a network management layer requires active intelligence and graph awareness, will not be configuring network devices. It will do one thing well — ship software to cloud and datacenter configurations.

Output is a bit of a consideration — the system must, like ansible show errors when they occur on remote systems, and provide ways to fetch those configuration logs. The return code of those configuration scripts can then be used to decide to make the load balancer switchover or not.

Are you running on a cloud and want to end all of your expenses with Kubernetes, Ansible, Puppet, Chef, and more? I do too. My earlier attempts at building open source projects in the modern era haven’t been super successful, so I’d like to do this in partnership some companies that have the need.

How To Make This A Reality

I like building software because I enjoy conservations. The best software has real user needs and testing to drive it.

If this sounds useful to you, and you want to get off the vendor software crazy train, email me at michael@michaeldehaan.net and we can start building something along those lines to help you claw back IT operational costs by merging the best ideas of 2019 with lots of great old school knowledge.

Nothing here is difficult, in fact, it might be exceptionally quick, and then if it works out for you we could share it with the world — and it might help change it. What do you think?

Michael DeHaan

Written by

Software, synthesizers, startups, photography, other randomness. Previously created Ansible and some other software things you might have heard of.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade