Why you should treat infrastructure like software

Season of Scale

Introduction

“Season of Scale” is a blog and video series to help enterprises and developers build scale and resilience into your design patterns. In this series we plan on walking you through some patterns and practices for creating apps that are resilient and scalable, two essential goals of many modern architecture exercises.

In Season 1, we’re covering Infrastructure Automation and High Availability:

  1. Patterns for scalable and resilient applications
  2. Infrastructure as code (this article)
  3. Immutable infrastructure
  4. Where to scale your workloads
  5. Globally autoscaling web services
  6. High Availability (Autohealing & Auto updates)

In this article I’ll walk you through the basics behind infrastructure as code.

Check out the video

Review

In the last article we learned about Critter Junction, a multiplayer gaming company that’s gained massive popularity in the last few months. Online players can interact with one another in a virtual world that follows life simulation as a critter.

They’ve been great at running individual machines on premise, but haven’t been able to automatically scale to many machines to handle peaks and dips in traffic. On top of that, they’re facing business and operational constraints. So our team has stepped in to to help them define scale and resilience and 3 themes design best practices fall into:

Automation

Loose coupling

And Data-driven design

The old way of provisioning infra

While Critter Junction has traditionally run on premise, they’ve just started to port workloads and rapidly spin up new servers in the cloud.

They’re used to the classic approach to infrastructure creation, which was to file a ticket. Someone would log it into a management portal and go through a series of steps to provision that piece of infrastructure.

This works if you have a relatively small footprint or if the churn of your infrastructure is minimal, which was usually the case with Critter Junction’s private data centers. A VM would live for months to years with a limited scale of deployment. But now with cloud elasticity, VMs can and should live shorter lives. They no longer have fixed costs with depreciating hardware. Instead you pay per second, minute, or hour. The challenge here is delivering a great user experience even if demand spikes.

Being able to rapidly spin up resources and spin them back down is a make-it or break-it for both cost optimization and user experience.

But for Critter Junction, manually filing tickets and trying to reproduce infrastructure at scale has been time consuming, and human-error prone. If a spike occurs, it could take hours or even months to provision infrastructure in time. Plus, it was hard to track and audit changes.

Infrastructure as code

This is where Infrastructure as code (IaC) comes into play. It’s a technique to treat your infrastructure provisioning and configuration in the same way you handle application code. You can automate provisioning of your cloud resources with code, create templates for reproducibility, and store config files in a source version control so that it’s discoverable and can be audited.

Automating infrastructure with a CICD pipeline means any changes to your configuration can be automatically tested and deployed. By adopting IaC, Critter Junction was able to boost their ability to scale.

Google Cloud Deployment Manager is an IaC service that automates the creation and management of Google Cloud resources using a declarative YAML-based configuration format. From dev/test to prod, you can build repeatable environments with compute, networks, storage, databases, users, and permissions using simple configuration files.

For more complex architectures that you plan to reuse, you can even break your configuration into templates, which are a separate file that defines a set of resources. You can reuse templates across different deployments, which creates consistency across complex deployments.

How it looks

In this example, this configuration file specifies resources from Compute Engine and BigQuery along with firewall rules to provide access.

resources:- name: vm-created-by-deployment-manager  type: compute.v1.instance  properties:    zone: us-central1-a    machineType: zones/us-central1-a/machineTypes/n1-standard-1    disks:

- deviceName: boot
type: PERSISTENT boot: true autoDelete: true initializeParams: sourceImage: projects/debian-cloud/global/images/family/debian-9
networkInterfaces:
- network: global/networks/default- name: big-query-dataset type: bigquery.v2.dataset properties: datasetReference: datasetId: example_id- name: ssh-firewall-rule type: compute.v1.firewall properties: sourceRanges: ["0.0.0.0/0"] allowed: - IPProtocol: TCP ports: ["22"]

Now Critter Junction can use the gcloud SDK to create, update, and delete deployments.

gcloud deployment-manager deployments creategcloud deployment-manager deployments updategcloud deployment-manager deployments delete

Meanwhile, the Google Cloud Console shows them all of their deployed infrastructure in a hierarchical view.

Whether they need to do this one time, ten times, or a thousand times, this allows them to have granular control over the lifecycle of their resources. For example, they were able to reduce costs by running a script every morning that brings up hundreds of machines, and every evening, use the same script to scale it back down.

Use existing tools

Because no one likes drastic changes to existing DevOps workflows, you can use other IAC tools, like Terraform, Chef, and Puppet to deploy resources in Google Cloud and on-prem from one place.

Like CICD, IaC is one of the key practices of DevOps, allowing you to achieve agility in development while staying focused on product quality. By removing manual steps from their infrastructure provisioning, Critter Junction has been able to get consistency and speed of deployment in their staging, QA, and production environments at scale. Stay tuned to follow their journey.

And remember, always be architecting.

Next steps and references:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Stephanie Wong

Stephanie Wong

795 Followers

Google Cloud Developer Advocate and producer of awesome online content. Creator of the series, GCP Networking End-to-End; host of Google’s Next onAir. @swongful