Why I built my own tool to deploy landing zones on AWS

Nicolas Malaval
10 min readSep 16, 2023

--

TLDR: Since I couldn’t find a suitable solution, I built a tool called AWS Orga Deployer to deploy and manage infrastructure-as-code at the scale of an AWS organization. It enables to deploy Terraform and CloudFormation templates and to execute Python scripts in multiple AWS accounts and multiple regions, making it particularly suitable for building AWS landing zones.

Building landing zones on AWS generally consist of creating a multi-account structure and deploying foundational resources to provide an easy-to-use AWS environment to your application teams, while maintaining sufficient control and compliance with your security requirements.

For example, it may involve enabling and configuring AWS security services (CloudTrail, Config, Access Analyzer, GuardDuty, Security Hub…) or creating security resources (KMS keys, IAM roles…) or network resources (VPCs, VPC endpoints, cross-VPC connectivity…) in such a way that this setup respects certain patterns and application teams don’t have to do it.

Infrastructure-as-code (Terraform, CloudFormation…) can help to save time and ensure predictability. However, because of the isolated nature of AWS accounts and regions, this still requires to instantiate templates or scripts across many AWS accounts and regions.

In my current position, as we faced this challenge, I explored how it could be solved with some of the existing services and tools, and finally decided to create my own.

Exploring existing tools and services

This section describes the tools and services I have explored and found insufficient to meet the need. Feel free to comment or share other options by responding to this story.

AWS Control Tower

When AWS released Control Tower, I heard people (even at AWS) say that THIS is now the service to build landing zones.

Wait… By default, AWS Control Tower configures a few elementary services: IAM Identity Center, CloudTrail, Config, some Service Control Policies (SCP) in Organizations, some Config Rules, and eventually a standalone VPC. This baseline offers very few customization options, and represents — in my opinion — only a fraction of what should constitute an enterprise-grade landing zone.

However, Control Tower now offers Account Factory Customizations (AFC) and Account Factory for Terraform (AFT) to customize AWS accounts. Customizations are deployed either as CloudFormation templates via Service Catalog for AFC, or as Terraform templates or Python scripts for AFT by committing changes to a file in a code repository.

I haven’t used Control Tower enough but, reading the documentation, I expect 4 major limitations:

  1. Both AFC and AFT requires to setup AWS Control Tower first. It means that you have to deal with the baseline discussed above. Moreover, I wonder what would happen if we deploy resources using AFC or AFT that are not part of the baseline today such as GuardDuty, but could be added later into Control Tower.
  2. AFT is not a managed service and must be deployed and maintained. The architecture is complex (based on CodeBuild, CodePipeline, Step Functions and more) and I anticipate that troubleshooting deployment issues may be a challenge.
  3. I have not seen an easy way to deploy customizations in all accounts or all accounts of a given OU. I wonder, for example, how to easily deploy a new Config Rule in all accounts of the organization.
  4. There seem to be challenging limitations: GitLab not supported as code repository in AFT, only one blueprint may be deployed per account with AFC according to the documentation, customizations may be deployed in the Control Tower home region or all supported regions but not a selection, etc.

Terraform and Terragrunt

Deploying a landing zone requires to iterate over many AWS accounts and regions. In Terraform, it seems that the AWS provider — which specifies the region and the credentials to use — still cannot be created dynamically.

It means that you must declare one provider for each account and region, and duplicate resources as many times as there are providers. That is a LOT of redundancy, prone to errors.

Of course, you may develop a custom script to generate Terraform code, or use Terragrunt that helps reduce Terraform code redundancy. For example, Terragrunt can automatically generate a file provider.tf for each account in a list.

You still need to specify which modules to deploy in which regions, accounts or OUs. Even with Terragrunt, this requires a custom script as I couldn’t find a way to query AWS Organizations and generate this configuration natively and dynamically.

This was our initial approach: execute a Python script that queries AWS Organizations, generate a hierarchy of folders and files for Terragrunt based on a configuration file that specifies which modules to deploy in which accounts and regions, and run Terragrunt. However, we faced some challenges:

  1. Terragrunt may be very slow to start if it needs to browse the hierarchy of folders with thousands of modules to deploy.
  2. Terragrunt executes the Terraform commands plan and apply on everything, not only on accounts and regions with pending changes. You may use the option --terragrunt-include-dir to restrict the scope of execution, but this is manual.
  3. It’s difficult to navigate the logs when Terragrunt runs several Terraform instances in parallel. I couldn’t find an easy way to identify planned changes when terragrunt run-all plan prints tens of thousands of messages to the screen.

CloudFormation StackSets

If you use CloudFormation, StackSets allows to create CloudFormation stacks in multiple AWS accounts and regions. I have used it in the past, and noticed several limitations:

  1. StackSets doesn’t deploy stacks in the management account — previously called the master account. This may be an issue if you create StackSets in another account, such as a security account.
  2. StackSets allows to override parameter values for existing accounts. However, new accounts will use the default values. Therefore, it requires either to divide stack sets (e.g. one for production OU and one for development OU if they don’t use the same parameter values) or to automate the override of parameter values after account creation.
  3. It is not possible to define dependencies between stack sets, or even between stacks deployed by the same stack set. If you choose to automatically deploy stack sets to new accounts, resources with dependencies must be created by the same stack.

IaC Management Platforms

IaC management platforms (Terraform Cloud, env0, Sclar…) help to manage infrastructure-as-code throughout the software development lifecycle and adopt a GitOps approach (version control, code review, automation, drift detection…).

I haven’t used these platforms, but I wondered if it was possible to use them to deploy landing zones. Looking at how their configuration guide and their pricing model that depends on the number of “stacks”, I believe that they may be well suited for application deployments, but not foundations in many AWS accounts and regions.

Introducing AWS Orga Deployer

Given that I couldn’t find an ideal ready-to-use solution,I ended up creatin my own tool, AWS Orga Deployer. This section gives an example of how to use it.

Example context

Let’s assume that we have an AWS organization with 4 accounts:

  • Security account (123456789012). Landing zone resources are configured from the security account, which assume a role named SecurityRole in each account.
  • Production account (123456789013) in an OU Production (ou-123).
  • Development account A (123456789014) and development account B (123456789015) in an OU Development (ou-124).

And we need to:

  • Execute a Python script in all AWS accounts to remove the default VPC in all regions, except in the security account.
  • Create a VPC in production and development accounts in Ireland using CloudFormation.
  • Create a EC2 Instance Connect endpoint in each VPC using Terraform.

Of course, this example doesn’t really make sense in real life, but I needed a simple example that uses most of the features of AWS Orga Deployer.

Step 1: Install AWS Orga Deployer

AWS Orga Deployer is a Python package with a CLI executable. To install it, just execute the following command:

pip install aws-orga-deployer

Step 2: Create a package and three modules

A module consists of a Terraform template, a CloudFormation template or a Python script. A package is a collection of modules to deploy in your AWS accounts and, in this example, consists of a folder structured as follows:

package-folder/
package.yaml
python/
delete-default-vpc/
main.py
cloudformation/
vpc/
cfn-template.yaml
terraform/
ec2-instance-connect/
endpoint.tf
variables.tf

The file python/delete-default-vpc/main.py is the Python script used to delete default VPCs. It must contain a function main whose inputs and outputs are detailed in the documentation:

import boto3

def main(module, account_id, region, command, action, variables, module_config, module_dir, deployment_cache_dir, engine_cache_dir):
ec2_client = boto3.client(region_name=region)
# [...] Retrieve the default VPC if it exists
if default_vpc_exists:
if command == "apply":
# [...] Delete dependent resources (subnets...)
ec2.delete_vpc(VpcId=default_vpc_id)
return True, "Deleted default VPC", None, None
else:
return False, "No default VPC", None, None

The file cloudformation/vpc/cfn-template.yaml is the CloudFormation template that creates the VPC. In this example, it has a parameter VpcCidrand returns the ID of the VPC:

AWSTemplateFormatVersion: "2010-09-09"
Parameters:
VpcCidr:
Type: String
Resources:
Vpc:
Type: AWS::EC2::VPC
Properties:
# [...]
Outputs:
VpcId:
Value: !Ref Vpc

The files at terraform/ec2-instance-connect/ contains Terraform files that create the EC2 Instance Connect endpoint. The files variables.tf declare one variable vpc_id, which is the ID of the VPC created above.

Finally, the file package.yaml is the package definition file and contains the list of accounts and regions to which each module must be deployed, and with which parameters:

# A S3 bucket is required to store package persistent files
PackageConfiguration:
S3Bucket: "<S3 bucket name>"
S3Region: "<S3 bucket region>"

# Specify the IAM role to assume by default for all module deployments.
DefaultModuleConfiguration:
All:
AssumeRole: "arn:aws:iam::${CURRENT_ACCOUNT_ID}:role/SecurityRole"

Modules:
# delete-default-vpc is deployed in all regions, and all accounts
# except the account named "security".
delete-default-vpc:
Deployments:
- Exclude:
AccountNames: ["security"]
# vpc is deployed in Ireland, in 3 accounts only.
# A different value for "VpcCidr" is defined for each account.
vpc:
Configuration:
StackName: vpc
TemplateFilename: cfn-template.yaml
Deployments:
- Include:
AccountIds: ["123456789013"]
Regions: ["eu-west-1"]
Variables:
VpcCidr: "10.13.0.0/16"
- Include:
AccountIds: ["123456789014"]
Regions: ["eu-west-1"]
Variables:
VpcCidr: "10.14.0.0/16"
- Include:
AccountIds: ["123456789015"]
Regions: ["eu-west-1"]
Variables:
VpcCidr: "10.13.0.0/16"
# ec2-instance-connect is deployed in all accounts of the
# OUs Production and Development, in Ireland only.
ec2-instance-connect:
Deployments:
- Include:
OUIds: ["ou-123", "ou-124"]
Regions: ["eu-west-1"]
# The input "vpc_id" has the value of the output "VpcCidr"
# of the module vpc. This creates a dependency (vpc is deployed
# after ec2-instance-connect).
VariablesFromOutputs:
vpc_id:
Module: vpc
AccountId: "${CURRENT_ACCOUNT_ID}"
Region: "${CURRENT_REGION}"
OutputName: VpcId

Step 3: Deploy the modules

Go to package folder, and execute the command aws-orga-deployer list to list the changes to be made:

> aws-orga-deployer list
INFO Found 3 modules in this package (1 terraform, 1 cloudformation, 1 python)
INFO Querying AWS Organizations for information on accounts and organizational units
INFO Found 4 accounts and 3 organizational units
INFO Deployments to create: 57 (0 skipped due to CLI filters)
INFO Exporting the list of deployed modules and changes to be made to output.json

At this stage, AWS Orga Deployer has only queried AWS Organizations to retrieve the list of accounts and OUs. It compares the list of expected deployments with the current state stored in S3 to evaluate the changes to be made.

Execute the command aws-orga-deployer preview to evaluate pending resource changes for each module to deploy:

> aws-orga-deployer preview
INFO Found 3 modules in this package (1 terraform, 1 cloudformation, 1 python)
INFO Loading information on accounts and organizational units from the cache in S3
INFO Found 4 accounts and 3 organizational units
INFO Deployments to create: 57 (0 skipped due to CLI filters)
INFO "preview" will determine which resources to add, update or delete if the pending deployments are applied
Enter "yes" to confirm the deployment scope (use the command "list" for details): yes
INFO [delete-default-vpc,123456789013,eu-west-1] Starting to create (Attempt 1/1)
INFO [delete-default-vpc,123456789013,eu-west-2] Starting to create (Attempt 1/1)
INFO [vpc,123456789013,eu-west-1] Starting to create (Attempt 1/1)
INFO [vpc,123456789014,eu-west-1] Starting to create (Attempt 1/1)
[...]
INFO [delete-default-vpc,123456789013,eu-west-1] Completed - Deleted default VPC
INFO [delete-default-vpc,123456789013,eu-west-2] Completed - Deleted default VPC
INFO [vpc,123456789013,eu-west-1] Completed - 35 resources to add, 0 to change, 0 to delete
[...]

For each module to create, it returns the list of resource changes to be made. Note that the module ec2-instance-connect cannot be previewed because it depends on outputs that don’t exist yet.

Note that you could apply the pending changes for delete-default-vpc and vpc using aws-orga-deployer apply --include-modules delete-default-vpc vpc and try again to preview.

Finally, execute the command aws-orga-deployer apply to apply the pending resource changes (including for the module ec2-instance-connect):

> aws-orga-deployer apply
INFO Found 3 modules in this package (1 terraform, 1 cloudformation, 1 python)
INFO Loading information on accounts and organizational units from the cache in S3
INFO Found 4 accounts and 3 organizational units
INFO Deployments to create: 57 (0 skipped due to CLI filters)
INFO "apply" will apply pending deployments, resulting in the creation, update or deletion of resources
Enter "yes" to confirm the deployment scope (use the command "list" for details): yes
INFO [delete-default-vpc,123456789013,eu-west-1] Starting to create (Attempt 1/1)
INFO [delete-default-vpc,123456789013,eu-west-2] Starting to create (Attempt 1/1)
[...]

If you execute the command aws-orga-deployer list again, AWS Orga Deployer compares the list of expected deployments and the list of current deployments stored in S3, and returns that there are no changes to be made.

What if we update the Terraform module?

Imagine that we change the Terraform templates of the module ec2-instance-connect to modify the tags assigned to AWS resources:

AWS Orga Deployer would detect this change because it compares the “hash” of the current version of the module with the hash of the deployed version. The command aws-orga-deployer list would return that there are 3 deployments to update:

> aws-orga-deployer list
INFO Found 3 modules in this package (1 terraform, 1 cloudformation, 1 python)
INFO Loading information on accounts and organizational units from the cache in S3
INFO Found 4 accounts and 3 organizational units
INFO Deployments to update: 3 (0 skipped due to CLI filters)
INFO Exporting the list of deployed modules and changes to be made to output.json

What if we add a new AWS account?

Let’s imagine we create a new development account: all you need to do is add a block to the vpc module’s Deployments list to define a CIDR, and execute the apply command again. AWS Orga Deployer will automatically detect that the deployment of delete-default-vpc and ec2-instance-connect is needed in this new account, given the inclusion criteria.

Conclusion

In this story, I have explained the reasons that make me end up creating AWS Orga Deployer, a tool to deploy infrastructure-as-code in many AWS accounts and regions. Feel free to use it, share feedback and contribute in GitHub.

--

--

Nicolas Malaval

Ex-AWS Professional Services Consultant then Solutions Architect. Now Technology Lead Architect at Biogen Digital Health. Opinions are my own.