Over the past few months, I’ve been using Terraform and CodePipeline to automate deployment of Lambda functions across multiple AWS accounts for a platform we’re building at Work & Co. I’ve created an example on GitHub here. I used Clojure in the example because it’s my preferred language, but the approach will work for any supported language.
We have four AWS accounts under our client’s master account using AWS Organizations: ops, dev, stage, and prod. The ops account is where we manage access control. Any users are added to ops, and access is granted to the appropriate roles on the dev, stage, and prod accounts. One slight hassle with cross-account IAM permissions is that permissions have to be specified on both accounts, so all users have to be added to the role’s trusted entities on each account. (Edit: I was mistaken on this point — you can simply use the origin account root as the principal. See this tutorial.) Once a user is set up, they can easily access resources across accounts using named profiles for the CLI, assume_role in Terraform, and Switch Role in the AWS Console.
We are using Terraform to manage all infrastructure. We store
“global” remote state in S3 on the ops account, and we encrypt everything at rest using KMS. We store environment “local” remote state in S3 on that environment’s account. “Local” state is restricted to the current deployment of a Lambda function within that environment. (To maintain environmental isolation, I did not want any resource in dev, stage, or prod to be able to reach up into the ops account.)
dev is our sandbox environment and where we build the Lambda deployment package and test during development. Lambda functions are tightly coupled to other AWS services, and keeping environments in sync is essential for QA across the platform. To facilitate this, we make stage and prod essentially identical (except for number and size of EC2 instances and DynamoDB throughput). Promotion from one environment to another is handled via a Lambda function that puts the deployment package in a known location on the build artifacts bucket of the destination account.
We keep each Lambda function in its own repo on GitHub. Besides the function’s code (and any necessary build infrastructure), we include two buildspec.yml files — one in the root for building the Lambda function and a second in deployment/ for deploying the function. The deployment configuration is applied “locally” within each account, so we use CodeBuild environment variables to inject the Terraform backend configuration.
We store the CodePipeline configuration in pipeline/ and use main.tf in the root to ease Terraform application. One of the irritating things about using multiple accounts is that we have to duplicate a fair number of variable and output declarations.
Terraform configuration is fairly repetitive, and I feel we need some sort of higher level abstraction to manage it effectively. I’ve found myself doing a fair bit of copy-paste throughout this project, and I’m of two minds about it. I’ll save that discussion for another post. (Update: I’ve written more about my experiences with Terraform here.)
We are still experimenting with developer workflow. CodePipeline deployments are rather slow, and because Lambda functions are tightly coupled to AWS services, it’s difficult to test locally. My current thinking is that we rely on CodePipeline to maintain consistency across environments while using either command-line tools or the AWS Console to upload code changes. I’d also like to make it easy for developers to spin up their own AWS resources for interactive development. I suspect a lein template is in order.
Another annoyance is that after an artifact is promoted to the next account, we lose any Git revision information in the CodePipeline interface. It would be nice to be able to annotate artifacts with metadata so that the CodePipeline UI remains useful.
Overall, this approach has been effective for us. Comments are welcome.
I stand by most of what I’ve written above, but I’ve made some updates to my approach over the past couple of years. The biggest difference is that I now keep one service per repo with all the Lambda functions in one place and a pipeline per service. While I intend to do a full write-up on my current approach, for now, check out my deployment-pipeline and pipeline-example repositories.
Added 24 December 2019
If you find my work interesting, sign up for my mailing list so you don’t miss a thing.