Infrastructure as Code — the Good 😇, the Bad 😈 and the Ugly 👹

Stacy Goh
Government Digital Products, Singapore
7 min readJan 1, 2019

Imagine this scenario- One day, your team decides that you need another environment for demonstrating features to stakeholders. The stakeholders are turning to you to replicate the same environment you did for development a year ago.

Sure, how tough is it? Except that you’ll probably need to recall what you’ve done a year back. You look into your development environment, desperately trying to figure out what you’ve installed or implemented. You suddenly remember that some of your colleagues added some configuration throughout the year as well. You then gather what they’ve added and spend another one to two days digging into the different components to gather more clues. Finally you’ve pieced together an environment as closely replicated as possible.

Wait a minute, this doesn’t seem right- the application cannot be deployed on new environment. “Was anything missed out?” and there goes a few hours of debugging.

Sounds familiar?

What is Infrastructure as code (IAC)? 🔧

Infrastructure as code simply means to convert your infrastructure into code, where it is managed by some kind of version control system, e.g., Git, and stored into a repository where you can manage it similar to your application — e.g., with versioning for traceability, peer review checks, testing etc.

IAC is particularly useful in configuring new environments or multiple build agents for running CI pipelines etc. In fact, it can even be used for on-boarding new members to the team. For instance, the team can simply pass the new member an Ansible script to have the necessary tools like the team’s preferred IDE, Chrome, Xcode, Android SDK, or package managers set up on his/her machine to run the application code and get started asap.

Here’s an example of what a quick setup of a user’s Macbook using Ansible looks like.

Some of the most popular open-sourced IAC tools include Ansible, Terraform, Docker, Chef and Puppet.

In this article, I will introduce the good, the bad and the ugly side of IAC and leave the decision of implementing IAC in your next project to you.

The good 😇

Replicability

With the infrastructure configuration written as code it means that infrastructure documentation is coupled with its implementation. With that, as long as the code is committed to a repository on the cloud, you can use the code to spin up another new environment in a few minutes — something that would take you days or even weeks to do if done manually.

One example of how we currently make use of replicability is when we have to set up more than one build machine to run the CI pipelines. Properly configuring an Ansible script once allows us to simply apply it on multiple machines ensuring that everything, from the scripts added to the version of the tools installed, is the same.

Another is when creating environments that should mirror each other — e.g., staging and production — but how can we be sure that staging and production stay similar? IAC helps with this by allowing us to create two environments using the same set of parameters generated by the same tool.

Disposability

With resources that can be easily created, removed, modified, and moved from one account to another, this can serve as a playground for developers where they can ‘move fast and break things’ — which leads to learning and improved understanding of how things work. They can spin up an environment fast, do some experimentation, tear it down and roll back to the latest configuration change.

Some perspective by Randy Bias on cloud architectural patterns —

In the old way of doing things, we treat our servers like pets, for example Bob the mail server. If Bob goes down, it’s all hands on deck. The CEO can’t get his email and it’s the end of the world. Servers or server pairs that are treated as indispensable or unique systems that can never be down.

In the new way, servers are numbered, like cattle in a herd. For example, www001 to www100. When one server goes down, it’s taken out back, shot, and replaced on the line.

Source: http://cloudscaling.com/blog/cloud-computing/the-history-of-pets-vs-cattle/

“Your server”

Accessibility

In the past, there was a clear separation between developers and “the ops engineer” where there was fear of taking risks because changes to infrastructure are not traceable and when things go wrong, it was difficult to revert the changes. With the rise of IAC, more developers can pick up the operations skillset and manage even the network or infrastructure.

Cloud Governance

Lastly, when infrastructure as code is implemented with “Policy as Code”, it can help with compliance to company policies which reduces the need for complex security audits. The infrastructure that is created from code already complies with governance processes.

An example of such “Policy as Code” framework is a tool released by Hashicorp called Sentinel. Before changes are applied to the infrastructure, those changes are first validated with Sentinel. Upon successful validation to ensure compliance, the changes are then applied to the infrastructure. Can you imagine the number of audits and checks this can save?

Read more about it here: https://www.hashicorp.com/blog/sentinel-announcement-policy-as-code-framework

The bad 😈

Discipline

With all the hype on Infrastructure as code, if not used properly, IAC can be pretty damaging. Consider a scenario where changes have been made to the infrastructure but the same changes were not applied to the code. Once applied again, the changes made manually will be overwritten.

Idempotency

Most tools are idempotent which means that it can be run multiple times with the same result. However, not all tools (Ansible, I’m looking at you) adopt this ideal. Here’s an example.

If I create an Ansible-playbook with the following code, running the code again will result in the line export ANDROID_HOME=~/development/android-sdk-macosx being appended more than once into the bash profile because the shell module in Ansible is not idempotent. For someone who is not aware, the result is a messy bash_profile with repeated lines of code.

- hosts: localhostgather_facts: notasks:- name: Append to bash_profileshell: echo "export ANDROID_HOME=~/development/android-sdk-macosx" >> /Users/stacygoh/.bash_profile

Instead, a better way to write it would be to use the lineinfile module where it checks if the regular expression (regexp) is present in the destination (dest), and if it isn’t, the line gets appended to the bash profile. By writing it this way, it becomes idempotent.

- hosts: localhostgather_facts: notasks:- name: Append to bash_profilelineinfile:dest: /Users/stacygoh/.bash_profileregexp: ^export ANDROID_HOME=line: export ANDROID_HOME=~/development/android-sdk-macosx

And…the Ugly 👹

Once, we modified an EC2 instance on AWS to add some monitoring configuration manually without communicating to some vendors (yes, my bad). The setting-up took me a sprint (2 weeks) to complete. Without knowing, the vendors reran the Terraform script and everything was wiped. We had to rebuild the whole thing from scratch and in the process, learnt how important regular backups are.

With IAC, there should be a common understanding among the team to always apply changes via code instead of manually.

Clearly, the strengths of IAC outweigh the bad and the ugly. However, IAC is not for every project. IAC is probably not for you if you want to do a quick prototype of your product because it is not worth the time and effort.

Yes, I’m in.. but which tool to use?

With a number of popular tools out there (e.g., Chef, AWS Cloud Formation, Google Deployment Manager, Azure Resource Manager), it can be tough making a choice on which to use for your project. I’ve used both Ansible and Terraform before and out of the two, I’ll strongly recommend Terraform simply because it allows you to define the end state aka it is a declarative tool.

Procedural tools like Ansible, on the other hand, defines the steps needed to get to the end state but it does not keep any record of the state. Hence, think of Ansible as a tool which keeps piling multiple steps each time it is ran and it is up to you to keep track of the end state to keep it immutable.

Terraform, and other declarative tools, on the other hand always allow you to declare the end state, which makes it truly immutable because it will always reach the end state regardless of the number of times ran.

Check out this really useful link for a detailed comparison of declarative vs procedural tools.

That’s all folks.

Feel free to leave any comments if you have any clarifications. We are also hiring UX designers/developers at Government Digital Services Singapore, so do drop me a message at https://imstacy.com or ping me at stacy_goh@tech.gov.sg if you think you have what it takes to join the happy club! 🎈

--

--

Stacy Goh
Government Digital Products, Singapore

Software Engineer at GDS, Govtech Singapore. I don't believe in deep down. I kinda think that all you are is just the things that you do.