I’d like quickly to talk about my definition of immutable infrastructure. I’ve heard the term used recently to mean a few different things, and it’s caused people to talk across purposes. I’m not saying I’m right, but here’s one definition for discussion.
Let’s recap on why we want immutability. Why not update software in place? Partly because it’s very hard to do that without down time. Partly because in-place file and database schema updates are hard, especially testing the combinatorial explosion of all the versions that might need to transition into each other. But also because state is a pain. The more state you build up, the more chance of it being erroneously structured, or pathologically big. There’s a reason systems like Erlang and Akka just flush state as their first attempt to fix a crash. State also doesn’t have to be deliberate — maybe you write logs up to the rotation maximum somewhere, then update to a different app version with a different log location. You’ve just accidentally doubled your quota; cue a weird bug due to a full disk in three months.
To me, immutability of a container or VM doesn’t just mean I don’t intend to update the software in place. It means I want to ensure nothing about that environment changes. All state should be off-box in a hosted DB. All logs go straight to Elasticsearch without touching the filesystem. All config comes through the environment, not a file. Immutable also means not hacking up fixes in place — it means having everything that describes an app and its environment in version control, baking and pushing new images for every change, à la gitops.
Why not take steps to prevent mutation? An app like the one I described above should be able to run in a container with its filesystem mounted read-only. Do that, and let apps with unwanted side-effects fail fast. I can also try to extend this to my AWS estate by having very restrictive IAM roles and only allowing deployments via my CD system. I know it’s practically difficult to make everything 100% read-only, but I think it’s a good thing to aspire to.
As a final backstop I can undo mutation. Ironically, some of the best tools for this are the “old fashioned” converging infra tools — Puppet etc — which can detect and optionally proactively revert any changes. On the infra side we have terradiff, kubediff, etc. With such tools, we can continue to work the same way as with “immutable infrastructure” — for example: images can be pre-built with Packer, meaning we can boot and scale quickly and during build dependency outages. Repeatability is the same too; we still have a version-controlled declaration of what the infrastructure should look like at all levels, and with tools like Packer we can get it there and freeze it before anyone can use it, so it never changes under our users’ feet. By pulling in tools from the converging toolbox, we also have the added advantage of a “watchdog” putting any remaining mutable parts of the system back to where we want them, should they change.
To summarise: I’m not advocating upgrading anything in place. I’m saying we should use any appropriate tools to roll changes back, not forward. To ensure things don’t mutate, at least not for long. Yes, this means tools that enable in-place changes are installed, so immutability becomes a function of process and culture (I strongly recommend you read about gitops). All I’m saying is that given the intention to do that, we have the technology to help.
I don’t yet have a great name for this. Although it uses “converging infrastructure” tools, it’s not that, and I’m hearing “immutable infrastructure” used very loosely; to describe much less than I’m talking about here. My best name is “actively immutable infrastructure”, but I’d love to hear better suggestions in the comments or on twitter.