How Earnin Transforms Infrastructure with Terraform Enterprise

Published in

Tech @ Earnin

8 min readApr 16, 2021

Overview

Earnin serves our community members by providing early access to earned wages and working to create a financial system that is fair for people. Like any financial technology start-up, Earnin must frequently adapt to changing market conditions in order to provide a competitive product. Such changes must go from drawing board to mobile application release in a very short amount of time, and many of these initiatives require infrastructure in AWS. Hashicorp’s Terraform Enterprise allows us to rapidly create cloud infrastructure that is secure and cost-effective.

Architecture overview

Our community members interact with a mobile application which runs on iOS and Android devices. Our main products include Cash Out¹ (early access to earned wages) and WeWin² (Game & Year End Sweepstakes). The mobile application interacts with a monolith API gateway hosted on EC2 and dozens of microservices hosted in Amazon Elastic Kubernetes Service (EKS). Most of the microservices interact with other AWS resources, including Aurora RDS clusters, S3 buckets, DynamoDB tables, SQS queues, SNS topics, and Redis clusters. IAM roles and policies are heavily utilized for restricting access to resources.

While we encourage engineers to explore new AWS services and create innovative solutions, most of our infrastructure falls into common patterns, including:

EC2 instances in an Auto Scaling group with an Application Load Balancer for REST services
EC2 instances in an Auto Scaling group for message processing
Microservices and cron jobs hosted in EKS which require an IAM role and policy
Aurora RDS (MySQL) clusters
DynamoDB tables, with optional DynamoDB Accelerator (DAX)
Elasticache clusters (Redis or memcached)
Apache Kafka event streaming

Using Terraform Open Source version 0.11

Before the introduction of Terraform at Earnin, AWS resources were provisioned ad hoc using the console. This approach worked well for a small startup that needed to move very quickly, but occasionally led to disorganization and made it difficult to track a large volume of changes.

Terraform Open Source was introduced at Earnin in 2018, along with a core set of Terraform modules managed by a nascent Cloud Operations (CloudOps) team. The company quickly benefited from this in a few key ways, namely:

Change tracking and versioning in a GitHub repository
Improved disaster recovery ability
Reusability through core Terraform modules
Centralized environment configuration

The process for creating new infrastructure (or updating existing resources) in a test AWS environment involved cloning the GitHub repository, editing Terraform configuration files in a directory, applying the changes in a test environment, submitting a pull request for review by the CloudOps team, and waiting for approval and merge. Promoting the infrastructure to a production environment involved copying the configuration to another directory, submitting a second pull request, and waiting for the CloudOps team to review, merge, and apply it using the Terraform CLI. While we gained some key advantages from this process, the CloudOps team often found themselves as the bottleneck given the amount of manual effort required.

The benefits and challenges of Terraform Open Source

The primary benefit of Terraform Open Source was having infrastructure configuration in GitHub. All changes could be tracked, and proposed changes could be reviewed beforehand to ensure compliance with security policies and best practices. Together, we built a repository managing 8,000 AWS resources across 90 workspaces using a core set of 50 Terraform modules.

Another disadvantage of how we used Terraform Open Source was in how we structured the configuration code. We would start with a configuration for our development environment, test and validate it, and then make a copy of it for our production environment. While there were a few differences between the files given the variance between our AWS accounts, most of the code was simply duplicated. Any updates had to be made to both sets of files, and the configuration would often drift between environments.

The single largest disadvantage of our approach to Terraform Open Source was the growing reliance upon the CloudOps team as the size of the company grew. Every pull request was reviewed by one or more members of the CloudOps team, and changes to AWS were applied manually using the Terraform CLI. The best efforts of the CloudOps team simply could not keep pace with the infrastructure needs of the engineering teams, in terms of reviewing pull requests, providing guidance on architecture, and applying changes in AWS.

What we needed was a dramatic change in our level of automation, one that Terraform Enterprise could provide.

Terraform Enterprise proof-of-concept

Our evaluation of Terraform Enterprise began with the creation of a proof-of-concept instance in one of our AWS accounts. The CloudOps team installed the product and configured it with a GitHub repository where we could experiment and try things out.

Our Hashicorp account representatives have been great at answering our questions and helping us troubleshoot issues. One of the best suggestions they provided was to manage some components of Terraform Enterprise using Terraform Enterprise. The Terraform Cloud/Enterprise Provider lets us manage our workspaces, teams, and Sentinel policy sets within the tool itself. This ensures that changes to Terraform Enterprise are versioned in GitHub and follow the same pull request review process required for all other resources.

Migration of core modules to version 0.12 and Terraform Enterprise

Our migration to Terraform version 0.12 began with the conversion of our core modules. Our CloudOps team maintains these modules with the assistance of our product engineering teams. Tackling the modules first gave us a foundation for the rest of the configuration to be migrated and allowed the CloudOps team to get hands-on experience with the features introduced in version 0.12.

If you haven’t checked out the features introduced in Terraform version 0.12, then I highly encourage you to do so.

Migration of product workspaces to Terraform Enterprise

Migrating our workspaces from Terraform Open Source to Terraform Enterprise is more art than science. It involves merging two sets of configuration files into one, importing the existing AWS resources into a Terraform Enterprise workspace, merging a pull request to update the new GitHub repository, and merging a second pull request to remove the old code from the legacy GitHub repository. Some amount of creativity is also required to handle variance between environments that can’t be easily changed due to legacy application code (e.g. SQS queue names). The import process, for the most part, must work with existing resource names and not cause the resources to be impacted while being imported.

Another significant part of the migration to Terraform Enterprise was simply identifying ownership of resources and determining where they should reside within our GitHub repository today. It is common for a small start-up to shift priorities and change team structure, but the infrastructure created by those teams can remain for years down the road. Finding new owners for these old resources was a slow, but important process.

Targeting multiple environments using a single configuration

One of our primary goals in moving to Terraform Enterprise was to provision infrastructure in multiple AWS accounts using the same Terraform configuration and avoid the drift that occurred with our Open Source approach (due to duplicated configuration files). Migrating to Terraform Enterprise gave us the opportunity to rewrite all of our core Terraform modules, and each new module takes an argument indicating the target environments where its resources should be provisioned. For example, we can choose to provision an SQS queue and its dead-letter queue in our development environment using our queue Terraform module and passing target environments as [“dev”]. If we later choose to provision the resources in our staging environment, we can update the target environments to [“dev”, “stage”]. We can then ensure that the provisioned resources will be identical in all environments.

A single Terraform configuration file can be applied to multiple AWS accounts using workspaces with different variables.

Process improvements and self-service infrastructure provisioning

With Terraform Enterprise, the CloudOps team still provides a crucial oversight role in ensuring that our infrastructure is secure and follows our code style and the best practices provided by AWS. Every pull request will trigger a list of status checks, and those checks ensure that the pull request is tied to a Jira issue, the configuration is formatted appropriately, the affected workspaces plan without errors, and that the configuration adheres to our Sentinel governance policies. We have also taken advantage of the more granular workspace permissions that have been added to Terraform Enterprise, which allows us to increase self-service by providing specific elevated access without compromising security. This has helped the CloudOps team focus more on reviewing pull requests and less on the actual process of applying changes.

Security and governance using Sentinel policies

The best thing about Terraform Enterprise is that the CloudOps team no longer has to manually plan and apply changes using the CLI on our workstations (Sentinel policies are a close second). Much of the time the CloudOps team spent reviewing pull requests with Terraform Open Source was focused on checking for security issues, code style, proper use of Terraform modules, and general AWS best practices.

Sentinel policies allow us to automate much of the pull request review process. For example, we have a policy which checks for open SSH ingress on security groups (0.0.0.0/0 on port 22). Another policy ensures that common AWS resources have an owner tag, and that the tag value matches a list of recognized owners.

Provisioning and configuring EKS clusters using Terraform Enterprise

One of our biggest challenges has been creating and managing EKS using Terraform. This is primarily due to the complex nature of Kubernetes and the significant configuration involved in setting up a cluster. Even so, we have a proven method for provisioning and configuring EKS clusters that is highly automated.

Our Terraform Enterprise workspaces for EKS are divided into three concerns.

Infrastructure, including clusters, nodes, and security groups
Core services such as logging, APM, and other plug-ins
Microservice IAM roles and policies

The team-based access in Terraform Enterprise allows us to split resources into different workspaces and assign appropriate access to different teams. The CloudOps team can manage the cluster security groups and cluster/node IAM roles, the Kubernetes team can manage the core services and plugins, and product engineering teams can add their service IAM roles and policies as they build out new product features.

What the future holds

We’ve come a long way in how we provision and manage infrastructure, and have realized significant improvements in our efficiency thanks to Terraform Enterprise and our relationship with Hashicorp.

In March of 2021 we updated all of our Terraform Enterprise workspaces to Terraform version 0.14.5. We have several key goals around Terraform Enterprise that we will be working on in the near team.

Continue to allow engineering teams to manage their own infrastructure and minimize the time spent on oversight
Enhance our Sentinel policies to automate checks that are still performed manually
Distribute oversight of infrastructure changes to Terraform “champions” within engineering teams

Come join us

If you’re interested in joining our team and creating enterprise-scale infrastructure, please check us out.

[1]: Restrictions and/or third party fees may apply, see Earnin.com/TOS for details

[2]: https://www2.earnin.com/wewin/game-rules/