Infrastructure as Code at Pocketgems

Rishit
Pocket Gems Tech Blog
6 min readFeb 16, 2021

During my initial days on the Data Infrastructure team, I was assigned various features and bug fixes. Over the next two months, I faced three major issues:

  1. Absence of testing environment: We didn’t have any local/staging environment to test our changes. This made it difficult to be confident in our code when it was deployed to production.
  2. Lack of immutable infrastructure: We had multiple web servers for tasking system instances on production. So, any configuration-level changes required us to manually SSH to each of the web server instances and then change the configuration there. This process created more room for human error because we needed to repeat it for every instance and reload the services after making the changes.
  3. Complicated onboarding process: It was time-consuming to complete the local environment because a series of commands had to be run in a particular order to set it up.

Then, Adin, one of our engineers, suggested we use Infrastructure as Code (IaC) to solve these problems. At the time, it was a new term and concept for the whole team.

IaC is a method to provision and administer IT infrastructure through the use of source code rather than through conventional operating procedures and manual processes. Essentially, you describe the storage, compute, network requirements, and other details in a file stored in your version control system, such as GitHub, and the IaC spins up the infrastructure according to your code.

Why Infrastructure as Code?

There’s no doubt that cloud computing has had a major influence on how companies build, scale, and manage technology products. The ability to provision servers, databases, and other infrastructure with the click of a few buttons has led to an unprecedented uptick in developer productivity. It’s a breeze to spin up simple cloud architectures, but it’s also much easier to make errors when provisioning any complex infrastructure.

Cloud computing is far from being a cure, though. It allows you to set up your infrastructure needs quickly and solves difficult problems such as high availability and scalability, but it does nothing to solve the inconsistency issues. When you have more than one person performing the configurations, you’re bound to find discrepancies. The only way to avoid these kinds of mistakes is through automation, and Infrastructure as Code is helping engineers automatically launch cloud environments quickly and without mistakes.

With IaC you can fulfill developers’ infrastructure requirements more dynamically than ever and you can spin-up an entire setup by running a single piece of code. This increases the speed of the overall process, which reduces manual efforts and the time taken to deliver the infrastructure requirements. Similarly, you can spin-down environments by running a script that may save you fortunes in saved resources. In addition, IaC has other advantages as well:

  • Using IaC allows you to build environments rapidly without any human intervention.
  • IAC helps maintain consistency of builds across all environments like dev, QA, staging, or prod.

What did we use for IaC?

Infrastructure Provisioning

The term ‘provisioning’ is normally used by DevOps engineers to refer to getting computers or virtual hosts to use and install any necessary libraries or services on them. IaC allows for a single source of truth related to infrastructure provisioning. This source can be used to configure a different role for each instance.

Several different tools are available on the market now:

*Google Trend

There isn’t a lot of difference in terms of functionality, so we chose Ansible as the automation tool because it is:

  • Easy to adopt (increasing Google Trend, the high number of the community’s module, YAML).
  • Free of extra dependencies (SSH, Python).
  • Self-contained: We don’t have many servers to use standalone master for configuration management. If we need to scale up, then Ansible Tower will allow us to do so.

These tools let us provision the infrastructure but didn’t help us orchestrate the physical infrastructure address.

Infrastructure Orchestration

Orchestration means arranging or coordinating multiple systems. It’s also used to mean “running the same tasks on a bunch of servers at once, but not necessarily all of them.”

To orchestrate the environment together, we decided to use Terraform because it can orchestrate environments in different cloud providers or frameworks such as Docker and Vagrant. Terraform also allowed us to do provision a subset of infrastructure so we could have a specific configuration. Packer allowed us to easily create a “snapshot” as an Amazon Machine Image (AMI), Google cloud storage (GCS) Image, etc.

How we used IAC: Combining orchestration and provisioning

Ansible-managed Packer to build a base image

As you can see from the diagram above, we used the Packer build configuration file along with the Vagrant builder/template to create a base image file and used ansible-playbook to provision the image. Please note that the final output of this packer process can be an Amazon Web Services (AWS) AMI, a Vagrant box image, or a Docker image depending on whether the environment is production or local. Also, we called this image a base box image because it was somewhat provisioned. We say somewhat provisioned because other things like Postgres, Redis, dynamodb, and the like need to be provisioned later after the VM/instance is up and running.

Spinning up a VM/instance using Ansible and Terraform

To make it simple, we assumed that we were using a local environment. We used Vagrant along with the VirtualBox manager to set up the virtual machine locally. Then, we used VagrantFile to invoke Ansible using Ansible provisioner. As mentioned earlier, we still needed to create multiple database instances, virtual private networks, etc. for which we would be using Terraform. Using ansible-playbook, we invoked Terraform to build multiple instances such as Postgres instance, Redis instance, Dynamodb instance, and more using Docker images inside the virtual machine and then performed the rest of the provisioning tasks. We used a Terraform module in Ansible to achieve this.

Once the Docker containers were ready and Terraform had finished running, Ansible ran the rest of the provisioning tasks like reloading uwsgi, starting the tasking system, and so on. Then, voila! We had our local infrastructure up and running.

Conclusion

Looking back on the problems we faced when I had just joined, it’s clear that this IaC solution addressed the issues we faced at the outset.

  1. Improved testing: The local infrastructure is almost a replica of prod/staging infrastructure. This gives developers more confidence in testing and deploying changes to production.
  2. Immutable infrastructure: All the environments are provisioned using automated manifests, so there is no room for human error. This ensures system uniformity throughout the whole delivery process and removes the risk of configuration drift. All configuration changes to cloud infrastructure now go through code review in GitHub instead of just SSHing to the instance or going to AWS CLI first and then making changes. Now, we always know what recent changes were made, which also helps a lot in debugging.
  3. Simpler onboarding process: We built custom Command Line Input (CLI) commands which internally use Packer, Ansible, Vagrant, and Terraform to set up virtual machines. Now, the user doesn’t have to dwell on any details.

--

--