How Darwinbox Manages Infrastructure at Scale with Terraform

Darwinbox
Tech@Darwinbox
Published in
6 min readDec 17, 2019

A brief introduction…

What is Darwinbox?

Darwinbox is an end-to-end integrated Human Capital Management (HCM) platform for large enterprises, with the core HR process modules (Leaves, Attendance, Documents), Payroll, Recruitment etc., that aid in streamlining activities across the employee lifecycle while also keeping them engaged and inspired with new-age Employee Engagement and Performance modules.

In this blog, we take a deep-dive into how Darwinbox is managing AWS cloud infrastructure at scale with Terraform.

First things first…

What is AWS?

Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform, offering over 165 fully-featured services from data centres globally. Millions of customers — including the fastest-growing startups, largest enterprises, and leading government agencies — trust AWS to power their infrastructure, become more agile, and lower costs.

What is Terraform?

Terraform is a tool for building, changing, and versioning infrastructure safely and efficiently. Terraform can manage existing and popular service providers as well as custom in-house solutions.

At Darwinbox, we run a containerized multi-tenant HRMS solution deployed as a collection of microservices on AWS.

Our architecture is designed to expand to support both Multi-cloud and Hybrid-cloud. Additionally, we have a smaller piece of workload simultaneously running on both models. With this we seek to improve security and performance through an expanded portfolio of environments and cloud provided features.

FYI, multi-cloud architecture is different from multi-tenant architecture. Multi-cloud represents more than one cloud deployment of the same type and it could be a public or private cloud, sourced from different cloud providers. The latter, however, refers to a software architecture in which a single instance of the software runs on a server thus serving multiple tenants.

What were we trying to solve?

Security, Performance, Expertise, Portability, and Cost were some of our main concerns when looking for a platform such as Terraform. But how did we end up making Terraform our IAC of choice? Let’s knock the other options out one at a time:

There are multiple steps involved in setting up an environment for an app on the cloud. Unless you write them all down as detailed checklists and follow them closely, all the time, you are bound to make mistakes.

Let’s consider UI’s as an alternative: There are several UI’s out there— AWS Management Console being a prime example. However, these tools have a lot going on under the hood. One-click on the UI could actually invoke a cascade of changes that are extremely hard to control. And what’s worse is that there is usually no way to undo what you do on the UI. Considering what’s at stake here, the usual “Are you sure?” prompts are simply not enough.

Ok, but what about CLI’s? CLI’s would have been better than UI tools for our use cases. However, you are still prone to making changes by hand or writing bash scripts which can easily slip out of hand.

This is where Terraform fits in. Terraform can be used to conduct a piece of IT infrastructure. What does this mean? You tell it what to deploy, and Terraform links it all together and performs all the necessary API calls.

The next question we needed to answer was: Do we need one cloud account per environment?

There are three general approaches used for environment separation on the cloud eco-system:

  • Separation by Naming Convention and Tags
  • Separation by VPC and IAM roles
  • Separate accounts

We chose to use multiple accounts — one account per environment. Here’s why:

Darwinbox managed separate Production, Staging, QA, and Development accounts on AWS. Furthermore, if everything were on one account then it would be impossible to completely restrict a high-permission user to a certain set of resources (For e.g. just the dev environment). And the simplest way to get the desired isolation was with multiple accounts. This would help prevent accidental damage to production resources and can also limit the damage a hacker would be able to do if high-permission keys were to fall into his hands.

Infrastructure at Darwinbox is treated like any other application code, any change will be versioned, reviewed, merged and pipelined.

Process flow for change in Infrastructure:

  1. The Developer/SRE Engineer changes the Terraform configuration file on the local machine and commits the code to Gitlab.

2. The Gitlab webhook triggers a continuous integration job to Jenkins.

3. Jenkins then pulls the latest code from the configured repository which contains the Terraform files, to the workspace.

4. Once it has read the Terraform configuration, it then initializes the remote console on the backend.

5. Terraform locks the state on DynamoDb.

6. Terraform generates a plan about the changes that have to be applied to the infrastructure.

7. Jenkins sends a notification to a Slack channel about the changes for manual approval.

8. Here, the user can approve or disapprove of the Terraform plan.

9. The user input is sent to the Jenkins server before proceeding with further action.

10. Once the changes are approved by an operator, Jenkins will execute terraform apply — the command to reflect the changes to the infrastructure.

11. Terraform then creates a report about the resources and their dependencies created while executing the plan.

12. Terraform will provision the resources in the provider environment.

13. Jenkins will again send a notification to the Slack channel about the status of the infrastructure after applying changes to it. Once the job is executed, Jenkins pipeline job is configured to clean up the workspace created by the job.

Some Terraform Best Practices Followed at Darwinbox:

1. Avoid hard coding:

Sometimes developers manually created resources directly. You need to mark these resource and use terraform import to include them in codes.

A sample:

account_number=“123456789012"
account_alias="mycompany"

The current AWS account id or account alias can be input directly via data sources.

# The attribute `${data.aws_caller_identity.current.account_id}` will be current account number. 
data "aws_caller_identity" "current" {}# The attribue `${data.aws_iam_account_alias.current.account_alias}` will be current account alias
data "aws_iam_account_alias" "current" {}# Set as [local values](https://www.terraform.io/docs/configuration/locals.html)
locals {
account_id = "${data.aws_caller_identity.current.account_id}"
account_alias = "${data.aws_iam_account_alias.current.account_alias}"
}

2. Run Terraform from a docker container:

Terraform releases an official Docker container that allows you to easily control which version you can run.

It is recommended to run the Terraform Docker container when you set your build job in the CI/CD pipeline.

TERRAFORM_IMAGE=hashicorp/terraform:0.11.7
TERRAFORM_CMD="docker run -ti --rm -w /app -v ${HOME}/.aws:/root/.aws -v ${HOME}/.ssh:/root/.ssh -v `pwd`:/app $TERRAFORM_IMAGE"

3. Update the Terraform version

Hashicorp doesn’t have a good QA/build/release process for their software and does not follow semantic versioning rules.

For example, terraform init isn't compatible between 0.9 and 0.8. Now they are going to split providers and use "init" to install providers as a plugin for the incoming version 0.10.

That’s why we recommend you manually keep updating to latest Terraform version.

4. Use Kitchen-Terraform to test your code

5. Enable version control on Terraform state files bucket

Always set the backend to s3 and enable version control on this bucket.

If you’d like to manage the Terraform state bucket as well, we recommend using this repository we wrote- tf_aws_tfstate_bucket to create the bucket and replicate it to other regions automatically.

For insights on Darwinbox’s breakthrough Tech innovations, follow us and watch this space for more!

-Written by Prithvi Raju Alluri

--

--