Sometimes Change is Bad: Immutable Infrastructure

Season of Scale

Stephanie Wong
Google Cloud - Community
5 min readJul 28, 2020

--

Introduction

“Season of Scale” is a blog and video series to help enterprises and developers build scale and resilience into your design patterns. In this series we plan on walking you through some patterns and practices for creating apps that are resilient and scalable, two essential goals of many modern architecture exercises.

In Season 1, we’re covering Infrastructure Automation and High Availability:

  1. Patterns for scalable and resilient applications
  2. Infrastructure as code
  3. Immutable infrastructure (this article)
  4. Where to scale your workloads
  5. Globally autoscaling web services
  6. High Availability (Autohealing & Auto updates)

In this article I’ll walk you through the basics behind immutable infrastructure.

Check out the video

Review

We learned about Critter Junction, a multiplayer gaming company that’s gained massive popularity in the last few months. Online players can interact with one another in a virtual world that follows life simulation as a critter.

They were used to the classic approach to infrastructure creation, which was to manually provision it…which was timely and error prone. Luckily, they’ve since adopted infrastructure as code (IAC) to automate provisioning of cloud resources with code. They can create templates for reproducibility and store config files in a source version control. Now they’re looking to optimize their DevOps practices during upgrades, and follow best practices for ultimate scalability and auditability.

Mutable infrastructure

Have you ever heard the phrase “Change is the only constant in life?” While that may be accurate for most things, you don’t want it to be true for your infrastructure.

“Mutable” means “capable of change”

Mutable infrastructure means cloud resources that can be changed — the same server is used for updates, patches, and configuration changes. For our friends at Critter Junction, they started with virtual machines that hosted their web servers.

Their VMs were originally designed to be mutable because of the short-term flexibility it gave them. Each VM could more precisely fit the apps that ran on it, and it was easy to run custom updates for each machine. The team often became attached to individual machines, which was occasionally useful if they needed to fix specific problems quickly.

For example, they ran a VM with NGINX and version 1 of their web server with data written locally. When they wanted to switch to Apache, it was easy for them to perform an upgrade on the server directly and run version 2.

On the surface this seems great. They were using an existing server and didn’t have to worry about moving the data around to other machines, or creating a new machine. But in the real world, things can go wrong. The Apache upgrade could have failed for a number of reasons, like networking issues. In their case, some of the servers updated quickly, while others got hung up on installation.

They were left with some servers running version 2 successfully, some that upgraded the app but couldn’t run it, and some servers that were still running version 1.Many of their VMs were failing in slightly different ways, and they were now facing:

  • Complexity with user traffic being routed to different running versions of their webserver
  • An inability to roll back to a previous version
  • And undocumented update statuses, making it impossible to track versions

Immutable infrastructure: the better way

Instead of changing each of their existing VMs, they should have changed their environment to be immutable.

Once a VM is deployed, it can’t be modified: no updates, no patches, no configuration changes.

If you want to change application code or apply a patch, you instead would build a new image and deploy it as a replacement. Because new environments can be spawned in the cloud in a matter of minutes, immutable infrastructure is easy to architect and deploy.

Take 2

After some long hours restoring version 1 of their application, here’s how Critter Junction handled their next deployment. Instead of making changes in place to each machine, they created a brand new VM image with Apache and version 2 of their app and moved any locally written data to an attached persistent disk. After some testing to make sure the new image was working, they spun up multiple servers using the new image. Now they could safely switch traffic to their new VMs and shut down the instances running version 1.

On Google Cloud, it looked like this for their team:

  1. They used Cloud Build to set up continuous integration with their repository, automating building and testing.
  2. VMS were specified using image families, so they could have multiple versions of their images and deprecate the newest version if they needed to roll back.
  3. And, they specified all of their resources using Cloud Deployment Manager in order to automatically provision it.

What about containers?

It’s the same pattern when working with containers: create a complete runtime environment and deliver that as an image. With containers, the run time environment is a program and the files it needs to run. With immutable infrastructure, you’re delivering a whole operating system configured as needed, along with the programs and their files. In both cases you continuously deliver versioned images.

Conclusion

Creating immutable infrastructure gave Critter Junction protection from configuration drift, since they now know the exact state of their machines and can avoid any unexpected surprises. In addition, they can now easily track versions, roll back releases, and have a more consistent testing process thanks to the documented differences between their environments. Immutable infrastructure might sound at odds with an agile architecture. But in reality, it gives you agility because you know the specifics and code status for every resource in your environment. Stay tuned to find out what’s in store for Critter Junction.

And remember, always be architecting.

Next steps and references:

--

--

Stephanie Wong
Google Cloud - Community

Google Cloud Developer Advocate and producer of awesome online content. Creator of the series, GCP Networking End-to-End; host of Google’s Next onAir. @swongful