AWS Multi-Account Architecture with Terraform, Yeoman, and Jenkins

Published in

Slalom Build

14 min readNov 26, 2018

by David Ramanauskas and Paul Bourdel with Chris Mortensen (Discover) and Scott Peterson (Discover)

Overview

Initially, having only one AWS account simplifies setup and management because everything is done in one place. In the long term, however, most organizations find this approach does not scale with their needs, and decide to look into ways to further isolate their AWS workloads. This post will show you our approach for implementing an AWS multi-account architecture in a highly automated and sustainable way. Having successfully implemented this approach at Discover, we are confident in its ability to operate at scale.

Concepts

Before we dive into the details, it will be helpful to review core concepts and terminology related to AWS account management. This is a high-level review to get started — these concepts will all be explained further below.

AWS Organizations

This is an AWS account management service that enables you to consolidate and centrally manage multiple AWS accounts.

Types of Accounts

Core Accounts: Supporting accounts that each have a specific purpose. They are there to support the team accounts.

Master Account: the account where AWS Organizations feature is enabled. From this account we will create all the other accounts.
Identity Account: this is the one account that everyone will login to. This can be done by configuring an existing directory such as Active Directory or SAML.
Logging Account: this is the account that will store all the logs from all the team accounts such as Cloudtrail and VPC flow logs.
Core Services Account: this account is used for running standard services available to all accounts such as Jenkins or Terraform Enterprise.

Team Accounts: Consumer accounts that are handed over to project teams for use in deploying their applications. There are as many of these accounts as there are teams, but from an administration point of view they should all be homogenous. Keeping team accounts consistent allows them to be created and maintained with the same account creation pipeline.

Role Management

IAM roles are how users and administrators of team accounts will gain access to their respective accounts by assuming a role in the account. This allows us to avoid the need for creating users in each team account, and as such eliminating the security risk of potentially having long lived access keys. By attaching policies to the roles we also enforce access in the accounts at the following levels of granularity:

Actions allowed to be performed by the role: Useful for restricting which action can be performed such as denying the right to create IAM users.
Resources that are allowed to be modified by the role: This can be used to restrict administrator resources from being modified, such as Cloudtrail Trails, VPCs, Lambdas used for monitoring the account.
Conditions that allow further granularity of control: Such as locking down which folders in an S3 bucket are writeable.

VPC Creation

With team account specific VPCs we have three options:

Leave the default VPC automatically created with new accounts by AWS.
Allow teams to create their own VPCs.
Create a custom VPC in each team account.

We prefer the 3rd option because it allows us to:

Decide the CIDR range of the VPC so there is no overlap of the CIDR ranges of the VPCs in the team accounts.
Automatically connect the VPC to the corporate network via a virtual gateway paired to a Direct Connect connection.
VPC peer to the core services account to give teams access to the company Jenkins, GitHub or other shared services.
Subdivide the VPC into a standard set of subnets.

Account Metadata

We maintain a DynamoDB table which keeps track of all the metadata for each account. This table is populated automatically by the account creation pipeline and allows programmatic access to scripts and Lambdas.

Account Creation Pipeline

A Yeoman generator is used to create a project per account. This project contains:

All the Terraform scripts used to create any standard resources in the account such as VPCs, IAM roles, Cloudtrail configurations, etc.
A Jenkinsfile which is executed by the Jenkins server. This Jenkinsfile file runs any Python or Terraform scripts that are responsible for hitting the AWS API to create the actual account and any standard infrastructures and other resources in the account.

AWS Organizations and Account Creation

AWS Organizations is an account management service that enables you to consolidate and centrally manage multiple AWS accounts.

Features of AWS Organizations

The following features help manage multiple accounts:

Organization Tree: allows you to organize all your accounts as a tree. Accounts will live under nodes in the tree. SCPs (described below) can be applied across all accounts in a node.
SCPs (Service Control Policies): allow you to enable and disable specific AWS services across all accounts in an organization. This allows you enable all AWS services in a sandbox organization while locking down the production organization to only services that are approved in your company.
IAM Integration: new accounts automatically have a role created that trusts the master account. This allows you to assume role into the new account without being required to setup alternative login methods.
Consolidated Billing: allows centralized billing and reporting of all accounts in your organization.
More Info: https://docs.aws.amazon.com/organizations/latest/userguide/orgs_introduction.html

Account Creation Automation

The initial step in our account creation and configuration pipeline is creating the account. This can be done with either Terraform or the Python AWS SDK. The preference we have is to automate things with Terraform if there is native Terraform support for the underlying resource, because this lets us manage the resource in an idempotent and declarative way. If native Terraform support does not exist, then writing a Python script using the AWS SDK to create the resource in an idempotent fashion is a workable way to go as well.

Creating an AWS Account with Terraform

We will create a Terraform project which will leverage the corresponding Terraform resource to create and manage the new AWS account. The project can be configured with any backend or other standard configuration as long as the project leverages the aws_organizitions_account resource to create the account.

Terraform ORGANIZATIONS_ACCOUNT resource: https://www.terraform.io/docs/providers/aws/r/organizations_account.html

resource "aws_organizations_account" "account" {
  name  = "my_new_account"
  email = "john@doe.org"
}

Role Management

Roles are managed across all AWS accounts using Terraform. This is useful because Terraform allows us to:

Define modules so that we can have one module for each type of account (team, identity, logging) and then use that module multiple times for each team account with a different provider. This helps keep things DRY.
Assume role into each account so we can run the project with just the AWS keys from the Identity account.
Use HCL to define policies. This allows us to easily use Terraform variables when defining policies further helping to keep things DRY.

Terraform Modules

In the below example, we will use submodules within our roles project to define different types of accounts — We have a team-account-roles and an identity-account-roles project.

├── README.md
├── team-one-account.tf
├── team-two-account.tf
├── identity-account.tf
├── main.tf
├── variables.tf
├── outputs.tf
├── ...
├── modules/
│   ├── team-account-roles/
│   │   ├── administrator_role.tf
│   │   ├── developer_role.tf
│   │   ├── ec2_role.tf
│   │   ├── variables.tf
│   │   ├── main.tf
│   │   ├── outputs.tf
│   ├── identity-account-roles/

Terraform Assume Role

We will instantiate the team-account-roles module once per team account. In order to run the full project with only the credentials from the Identity account, we will use Terraform’s assume role function. The contents of the team-one-account.tf file from the example above are:

provider "aws" {
  alias = "team-one-account"assume_role {
    role_arn     = "arn:aws:iam::TEAM_ONE_ACCOUNT_ID:role/Administrator"
  }
}module "team-one-module" {
  source    = "./modules/team-account-roles"
  providers = {
    aws = "aws.team-one-account"
  }
}

For each additional team account, we can create a file like the one above and only change the TEAM_ONE_ACCOUNT_ID to the ID of the new account and aliasteam-one-account to a new unique value.

Terraform IAM Policy Document

Terraform has a built in data element for defining AWS policies. This is easier than using JSON because the data element helps enforce a standard structure and makes it easier to reuse variables in the policy. This ensures things like account IDs or team names can be defined once and reused across multiple policies.

The file ec2_role.tf can look something like the below example, which has a role with a trust policy, policy, and role/policy attachment defined:

resource "aws_iam_role" "test_role" {
  name = "test_role"assume_role_policy = "${data.aws_iam_policy_document.example_assume_role.json}"
}data "aws_iam_policy_document" "example_assume_role" {
  statement {
    actions = [
      "sts:AssumeRole",
    ]principals = {
      type = "Service"
      identifiers = "ec2.amazonaws.com"
    }
  }
}data "aws_iam_policy_document" "example" {
  statement {
    actions = [
      "s3:ListAllMyBuckets",
      "s3:GetBucketLocation",
    ]resources = [
      "arn:aws:s3:::*",
    ]
  }
}resource "aws_iam_policy" "example" {
  name   = "example_policy"
  policy = "${data.aws_iam_policy_document.example.json}"
}resource "aws_iam_role_policy_attachment" "test-attach" {
    role       = "${aws_iam_role.test_role.name}"
    policy_arn = "${aws_iam_policy.example.arn}"
}

Links:

VPC Creation

As mentioned above, there are a handful of options for VPC implementation for multi-account architecture. We chose to create a custom tailored Terraform VPC module to incorporate into our new account creation pipeline. There are endless combinations of components to use in your VPC, but at the very least, the following are needed for a working VPC:

The VPC
Subnets
Route table with routes to the subnets

In addition to the bare essentials, we have added multiple other resources to make this a highly versatile VPC module. Things like VPC peering, DHCP option sets, endpoints, and VPN attachments can all be added to or removed from the module as needed. It’s worth pointing out that we have successfully implemented this pattern in a hybrid environment where on-premise resources need to be in direct communication with our cloud environment.

Below is an example module structure. Note that each VPC component is separated out into its own file or private sub-module for better organization.

├── README.md
├── vpc-dhcp.tf
├── vpc-flow-logs.tf
├── vpc-peering.tf
...Any additional TF files..├── main.tf
├── variables.tf
├── outputs.tf
├── modules/
│   ├── flow-log/
│   │   ├── variables.tf
│   │   ├── main.tf
│   │   ├── outputs.tf
│   ├── subnets/
│   │   ├── variables.tf
│   │   ├── main.tf
│   │   ├── outputs.tf

For the most part, creating a VPC module can be easily done with a solid understanding of both Terraform and AWS VPCs. However, there are a few specific resources that are worth going over in more detail.

Subnets

One of the most challenging aspects of designing any network, be it physical or in the cloud, is the subnetting. Thankfully, Terraform provides a useful function that can take in a VPC CIDR block and slice it up into the desired subnets. Here is a quick rundown of the cidrsubnet function (http://blog.itsjustcode.net/blog/2017/11/18/terraform-cidrsubnet-deconstructed/):

cidrsubnet(iprange,newbits,netnum)

iprange is the CIDR block of your virtual network
newbits is the new mask for the subnet within the virtual network
netnum is the zero-based index of the subnet when the network is masked with the newbit

With this powerful function, you can divide your block into as many or as few subnets as you wish, even if they need to be divided unevenly. Using a few Terraform tricks, we can loop through our newly created list of subnets and assign an availability zone, as well as create routes for each of them.

locals {
   # Generate the CIDRs for each subnet
   sub1  = "${cidrsubnet(var.vpc_cidr_block, 2, 0)}"
   sub2  = "${cidrsubnet(var.vpc_cidr_block, 2, 1)}"
   ...Divide into as many subnets as necessary...   
   # Add all subnets to lists
   subnets = "${list(local.sub1, local.sub2, ...)}"
}###################
# PRIVATE SUBNETS #
###################
# Private Subnets - 2 AZ
 resource "aws_subnet" "private_subnets" {
   count                   = "${length(local.subnets)}"
   vpc_id                  = "${var.vpc_id}"
   cidr_block              = "${element(local.subnets, count.index)}"
   availability_zone       = "${element(var.availability_zones, count.index)}"
   map_public_ip_on_launch = falsetags {
     Name    = "${var.private_subnet_name}-${count.index}"
     Purpose = "${element(var.subnet_purpose_tags, count.index)}"
     
   }
 }#################
# Subnet Routes #
#################
# Private Routes - 2 AZ
 resource "aws_route_table_association" "private_route" {
   count          = "${length(local.subnets)}"
   route_table_id = "${var.route_table_id}"
   subnet_id      = "${element(aws_subnet.private_subnets.*.id, count.index)}"
 }

Using the count function, we can call both the subnet and route resources as many times as there are subnets in the list. One other thing to note is that you can also use the count index to iterate through a list of, for example, availability zones and then in essence your subnets will be alternating availability zones.

Additional Links:

Routing

A quick tip for route tables is rather than going in and deleting the default route table that gets created with the VPC, Terraform has a unique type of resource that essentially lets you import the default route table to be configurable by Terraform. From here you can then just add the necessary routes for peering, subnets, endpoints, etc.

https://www.terraform.io/docs/providers/aws/r/default_route_table.html

Peering

In our case we also found it necessary to peer with another VPC in a separate account. Terraform makes this a relatively simple thing to accomplish.

Essentially, in the requester account we can start the handshake by requesting the peer, as shown below.

 # Requester's side of the connection.
 resource "aws_vpc_peering_connection" "example_peer" {
   provider      = "aws"
   vpc_id        = "${aws_vpc.vpc.id}"
   peer_vpc_id   = "${var.accepter_vpc_id}"
   peer_owner_id = "${data.aws_caller_identity.accepter.account_id}"
   peer_region   = "${data.aws_region.current.name}"
   auto_accept   = falsetags {
     Side = "Requester"
     Name = "VPC_${var.vpc_name}_to_VPC_Accepter"
   }
 }

Then we will use Terraform to assume a role into the accepter account and accept the peering request.

# Accepter's side of the connection.
 resource "aws_vpc_peering_connection_accepter" "example_peer" {
   provider                  = "aws.accepter_account"
   vpc_peering_connection_id = "${aws_vpc_peering_connection.example_peer.id}"
   auto_accept               = truetags {
     Side = "Accepter"
     Name = "VPC_Accepter_to_VPC_${var.vpc_name}"
   }
 }

In order for all of the role assumption to work as shown, when calling this module, two separate providers will need to be defined in main.tf. One for the requester account and one for the accepter account. Please refer to the Terraform Assume Role section, found earlier in the post.

Direct Connect

Direct Connect with a VPC uses a number of AWS components. A Virtual Gateway (VGW) will need to be created inside the VPC and a Virtual Interface (VIF) will need to be allocated and attached to the VGW. The VIF will act as the gate to the Direct Connect connection. This can be visually represented as such:

VPC -> VGW -> VIF -> Direct Connect

If using this pattern in a situation where AWS Direct Connect is being used, there are a few considerations to take.

Create the VGW separate from your VPC. This will ensure that if you ever need to destroy and recreate the VPC, the VGW and in turn the VIF stays in tact. You need only re-associate the VGW with the VPC.
As of right now Virtual Interfaces (VIFs) cannot be created via Terraform. Since a VIF is needed to start using Direct Connect, that will need to be created manually, or automated separately. We chose to automate it with a Python script that is hooked into the main account creation pipeline.

Flow Logs

If creating a VPC module for a corporate setting, it is almost certain that some form of logging will be required. We are implementing VPC Flow Logs in a centralized fashion, with all logs being sent to a single logging account. More on this pattern can be found here:

https://aws.amazon.com/blogs/security/how-to-facilitate-data-analysis-and-fulfill-security-requirements-by-using-centralized-flow-log-data/

Account Metadata

For each team account we create, it can be useful to store metadata about the account in a programmatically accessible way. This might either be data that is external to AWS and can’t be queried via the AWS API, or a collection of data that would be useful to have in one place.

By having the data in one place it can make account information easier to audit. This can also make the data easier to query and use in other Terraform projects via a custom data source (https://www.terraform.io/docs/providers/external/data_source.html).

To implement account metadata storage, we created a DynamoDB table in our Core Services account. Then during each account creation pipeline run, we add a record to the table keyed by the account ID with information about the account.

The account specific records can be added via the Terraform dynamodb_table_item resource:

https://www.terraform.io/docs/providers/aws/r/dynamodb_table_item.html

An example would look something like:

resource "aws_dynamodb_table_item" "example" {
  table_name = "${aws_dynamodb_table.example.name}"
  hash_key = "${aws_dynamodb_table.example.hash_key}"
  item = <<ITEM
{
  "accountId": {"S": "123456789012"},
  "accountName": {"S": "team1account"},
  "accountCreationDate": {"S": "1982-7-10"},
  "mainVpcId": {"S": "vpc-aaaaaaaa"}
}
ITEM
}

The above example would let Terraform add the row in the table tied to the account, as well as update the row during subsequent runs of the account creation pipeline in case the data changes.

Account Creation Pipeline

The Account Creation Pipeline is what ties all of the previous concepts together and executes them to create the account and corresponding infrastructure. There are two main pieces to the pipeline:

Custom Yeoman generator that creates an account specific project containing a Jenkinsfile
Jenkins job that is created based on the previous Jenkinsfile and is executed to create the account and anything else that we want to tie in with account creation

Yeoman Generator

Yeoman is a scaffolding tool that originated for helping quickly create web apps. It is very versatile, and suitable for a variety of other scaffolding tasks as well. In this case we will use it to create our Jenkinsfile and corresponding Terraform projects. The main value of the generator comes from several points:

Allows us to ask for input from the user for data that is necessary in creating our account, but not programmatically available
Enforces a standard folder and file structure for all accounts
Allows us to release new Terraform modules and roll them out easily by updating the generator and rerunning it in all the account projects. This makes it easy to roll out new features to existing accounts. Combined with Terraform’s declarative nature, making sure all our accounts have a consistent support infrastructure has never been easier.

Example Yeoman Run to Create a New Project

/projects/multi-account-infrastructure/team-2-dev-account 
$ yo team-account-project
? Account Name (ex. SLM-TEAM-DEV): team-2-dev
? Account Email (ex. consultatnt@slalom.com):pb@slalom.com
Creating Sandbox Jenkinsfile.
Creating Cloudtrail project.
create Jenkinsfile
create cloudtrail\outputs.tf
create cloudtrail\vars.tf
create cloudtrail\main.tf
create cloudtrail\backend.tfvars
create cloudtrail\dev.tfvars/projects/multi-account-infrastructure/team-2-dev-account 
$ ll
total 12
drwxr-xr-x 1 paul 1049089    0 Jun  7 15:04 cloudtrail/
-rw-r--r-- 1 paul 1049089 4153 Jun  7 15:04 Jenkinsfile/projects/multi-account-infrastructure/team-2-dev-account 
$ ll cloudtrail/
total 11
-rw-r--r-- 1 paul 1049089  264 Jun  7 15:04 backend.tfvars
-rw-r--r-- 1 paul 1049089  436 Jun  7 15:04 dev.tfvars
-rw-r--r-- 1 paul 1049089 1106 Jun  7 15:04 main.tf
-rw-r--r-- 1 paul 1049089  178 Jun  7 15:04 outputs.tf
-rw-r--r-- 1 paul 1049089 1866 Jun  7 15:04 vars.tf

Jenkinsfile

The Jenkinsfile describes the job that will be executed by Jenkins in a programmatic fashion. More information can be found here:

https://jenkins.io/doc/book/pipeline/syntax/

The file is a collection of calls to Terraform to create infrastructure, user input to approve the Terraform plans, as well as calls to Python scrips for AWS features not supported by Terraform.

Hashicorp provides a guide for running Terraform in automation which can be found here:

https://www.terraform.io/guides/running-terraform-in-automation.html

A sample Jenkinsfile can look like the following (pseudocode):

pipeline {
  parameters {
    password (name: 'AWS_ACCESS_KEY_ID')
    password (name: 'AWS_SECRET_ACCESS_KEY')
    ...Add more parameters...
  }
  environment {
    AWS_ACCESS_KEY_ID = "${params.AWS_ACCESS_KEY_ID}"
    AWS_SECRET_ACCESS_KEY = "${params.AWS_SECRET_ACCESS_KEY}"
    ...Add more environment variables...
  }
  stages {
    stage('Create Member Account') {
      steps {
        sh "python create-account.py -a ${params.ACCOUNT_NAME} -e ${params.EMAIL_ADDRESS} | tee account.json"
      }
    }
    stage('Enable CloudTrail') {
      steps {
        dir(path: "<%= teamname %>-account/cloudtrail") {
          dir(path: '.terraform'){
            deleteDir()
          }
          sh "terraform init -input=false -backend-config=backend.tfvars"
          sh "terraform plan -out=plan.out -input=false -var-file=dev.tfvars -var-file=backend.tfvars -var 'team_account_role_to_assume=$teamAccountRoleToAssume'"
          input 'Do you want to apply the plan?'
          sh "terraform apply -input=false plan.out"
        }
      }
    }
  ...More Stages Can Be Added for additional infrastructure...  }
}

Summary

I hope you enjoyed reading this writeup. While the code examples are not fully complete, they should be enough to give the general idea of how to achieve similar results, and give an idea of what is possible.

With this architecture it is possible to scale and maintain a large number of homogenous team accounts, allowing you to quickly roll out a datacenter per team which includes any custom infrastructure or logic your company or industry demands.

Feel free to ask any questions in the comments for any specifics on how certain features were implemented.

AWS Multi-Account Architecture with Terraform, Yeoman, and Jenkins

Overview

Concepts

AWS Organizations

Types of Accounts

Role Management

VPC Creation

Account Metadata

Account Creation Pipeline

AWS Organizations and Account Creation

Features of AWS Organizations

Account Creation Automation

Creating an AWS Account with Terraform

Role Management

Terraform Modules

Terraform Assume Role

Terraform IAM Policy Document

VPC Creation

Subnets

Routing

Peering

Direct Connect

Flow Logs

Account Metadata

Account Creation Pipeline

Yeoman Generator

Example Yeoman Run to Create a New Project

Jenkinsfile

Summary

Additional References

Written by Paul Bourdel