CloudGenesis: GitOps for CloudFormation

Ryan Means
9 min readJun 25, 2018

CloudFormation is a pretty awesome service — you get all the benefits of defining your AWS infrastructure in a concise declarative format and the ability to store those CloudFormation templates in version control as source. The problem that AWS left for you to solve is the process around taking those templates from source and launching them as stacks in the various accounts and regions that you need them.

In this post, we are going to explore some methodologies to address these issues in a secure manner and talk about our tool that we are open sourcing at LifeWay that we believe can greatly help the process of managing CloudFormation stacks at scale across many teams, accounts, and regions.

In the principle of least privilege, not only is it important for systems (ec2’s, containers, lambda’s, etc) to have least privilege access to only the things they need to be able to access, but also for engineers to have least privilege to administrate and access the data of systems such that they can only affect the systems they are responsible for managing and only have access to the data they need to be able to see. This can present a challenge when determining how to give access to engineers to manage their cloud resources without giving out too many permissions, particularly when it comes to giving them access to manage the full lifecycle of cloud resources. This is a big topic, and as this post is targeted towards CloudFormation we’re going to focus our attention there.

The security for CloudFormation is governed in one of two ways:

  • The default way that CloudFormation stacks are created is that the stack is launched using the permissions currently assumed by the caller themselves. For instance, if you are logged in with a role that has access to the arn:aws:iam::aws:policy/AdministratorAccess managed policy, you would be able to launch a stack that can create any kind of Cloud Resource. However, if you are logged in with something that can only create S3 buckets, for instance, the CloudFormation templates you launch could only contain resources that contain S3 buckets, any other resource type would fail as your user doesn’t have access itself to create those kinds of resources directly with that AWS service.
  • For more governed security, you can pass IAM roles to CloudFormation itself. This is called a Service Role, which is where the CloudFormation service assumes the role that you pass and uses that role to manage the resources created by the stack. Additionally, each stack can use a different service role that CloudFormation will use for managing that stacks’ resources. The primary win here is that the engineer only needs permission to create, update, and delete stacks. This permission can be further limited to only certain stacks or stacks matching some naming convention. The engineer’s role itself doesn’t need to have access to manage cloud resources — it just needs to have permission to pass a role — a Service Role scoped to CloudFormation. Due to the scoping, that role that the engineer has permissions to pass can only be assumed by the CloudFormation service itself, so the engineer can’t assume it for his own use.

With these basic building blocks of security around CloudFormation, you can develop a system of least privilege for the engineers that are managing CloudFormation stacks. One way that you can do this without any custom tooling is to allow teams to be able to manage their stacks by granting them the ability to pass a Service Role with the permissions to the types of cloud resources they are allowed to manage. Then, to further insure that stacks are created from a trusted source, you can use some IAM conditional operations like this:

{  
"Effect":"Allow",
"Action":"cloudformation:CreateStack",
"Resource":"arn:aws:cloudformation:*:*:stack/team-stack-prefix-*",
"Condition":{
"ForAnyValue:StringLike":{
"cloudformation:TemplateUrl":[
"https://s3.amazonaws.com/my-cf-bucket/*"
]
}
}
}

So with a little bit of IAM finesse you can create a least privilege system for your engineers to only be able to create and manage stacks that:

  • match a certain naming pattern (scoped to their team / product).
  • come from an approved source (e.g. a specific S3 bucket where the contents of the bucket may be automatically synced there from a Git repo).
  • can only create cloud resource types they are approved to create on their own (their service role permissions).
  • their permissions don’t need to be able to affect change to anything but CloudFormation itself.

Teams that are committing to GitHub and getting their templates peer reviewed and merged is a great thing, but after some time of practicing the controls above, you begin to notice that it is begging for automation. A human has to go and take that template once it sync’d to S3 and using their credentials (which had access to pass the CloudFormation Service Role approved for their team) they then either end up using the AWS CLI or console to manage the stacks by hand. The human must do the following correctly:

  • pick the correct stack to update (if updating)
  • use the correct template on the correct stack (if updating)
  • pass the correct parameters if the stack requires parameters
  • remember the previous parameters for the stack prior to updating (if needing to roll back to the previous template)

Automation is begging to fix all of this and we know humans are prone to error when doing things by hand! At LifeWay, we set out to solve these problems using the principles of GitOps

Speaker from above slide: https://twitter.com/luisfaceira

The (above) slide covers it well:

  • GIT as the SINGLE source of truth of a system
  • GIT as the SINGLE place where we operate (create, change, and destroy) ALL environments
  • All changes are observable / verifiable.

GitOps, perhaps, is not something new as much as it is about the tooling around making it happen. Infrastructure-as-code has been around for awhile now and perhaps discussed most famously by Jez Humble and David Farley in the book Continuous Delivery. Many of the common tools from that era such as Puppet, Chef, and Ansible could be defended as GitOps if those tools were fully operated on only from Git. As we mentioned earlier, CloudFormation enables Infrastructure-as-code on AWS, but what it doesn’t do for you is automate the process to affect those changes defined in code to the infrastructure itself. You must either do that yourself as defined above, or create some automation to do it for you.

Announcing CloudGenesis 🎉

CloudGenesis (repo) is our take at LifeWay to provide the automation necessary to fulfill the GitOps contract with the CloudFormation service.

CloudGenesis creates, updates, and deletes CloudFormation stacks as they exist in Git and tracks all notifications and stack status updates on Slack keeping everything observable as events happen in CloudFormation.

CloudGenesis operates as a Serverless Application launched via a single SAM stack where you define the roles you want CloudGenesis to have on your AWS accounts outside of the SAM stack itself. CloudGenesis does not assume anything about your security posture — you must provide the roles you want it to use with CloudFormation based on your own best practices.

The following is a brief summary of the things CloudGenesis currently supports (see the GitHub repo for more information):

  • GitHub Repositories (both Public and Private) — however, any Git source that CodePipeline & CodeBuild support may be added as a trivial enhancement to CloudGenesis
  • deploying stacks to multiple accounts and regions from a single Git Repo
  • clean separation from Templates and Stacks — allows Templates to be re-used, even across accounts
  • sourcing templates from approved buckets outside of the Git Repo. This is useful for shared or common templates that might be used across many teams
  • all notifications, errors, and status updates piped to a named Slack channel
  • support for secret stack parameters via SSM
  • external SNS Notification Hook (optional) — useful for notifying external systems that a change has happened
  • flexible security model (more on this later)

Demo

Let’s presume I have a Git Repo, and in that repo I have the following structure for the following files which have not yet been merged to master:

Where the contents of my-sns.dev.yaml stack files look like this:

And the contents of the sns.yaml template file contained this CloudFormation template:

Following a PR where the PR build on CodeBuild checks out:

Then the following would happen in Slack after a Git merge (should these stack files not have existed prior to the merge):

If the SNS external notifications were turned on, SNS events for each stack modified would have been published to that topic. Here is a demo payload:

🎉 Stacks launched! The entire process moves quickly once a merge happens. This is because the entire process is event driven using Lambda’s to operate on those events.

Setting up a flexible security model

The entire process from Git to CloudFormation is done completely with least privilege automation where the engineer operated on the infrastructure only within Git and saw all the events within slack. Using permission controls within the Git environment, only certain members are allowed to merge to certain repos that are tracked by CloudGenesis. Each of these CloudGenesis repos has their own deployer setup with their own least privilege permissions attached to their deployer just for their repo. Furthermore, when using GitHub you can use features like branch protection to require pull request reviews by a given number of team members before the merge can happen and disable the ability for anyone to direct push to master. All of these Git based controls feel very native to most engineering teams without learning anything new. Additionally, since each deployer (and therefore Git repo) can have its own permissions to CloudFormation you can scope this down as fine grained as you want. Because this is a serverless stack, your cost to run many CloudGenesis deployments will be low, therefore it’s best to use as many distinct repos and deployments of CloudGenesis as you need for least privilege separation for your engineering team. CodePipeline and CodeBuild should be the most expensive parts of this process and even those are fairly low cost.

How does CloudGenesis work?

CloudGenesis has two main components: a SAM Stack and a Git repo that is watched by CodePipeline & CodeBuild (which are resources created by the SAM stack). To turn changes happening with Git into events within CloudGenesis’ system, we synchronize the changes happening in Git with a versioned S3 bucket where we use S3 event notifications to turn those file changes into S3 events that we can then operate on.

A picture can be worth a thousand words here. The diagram below will give you an overview of all of the components involved.

Git Repo Setup

The Git repo must contain the CodeBuild files for both PR building (CodeBuild) and for merges to master (CodePipeline). You may tune the CodeBuild jobs however you see fit, but in the end, the primary function of the CodeBuild job is to take care of keeping Git in sync with the S3 bucket created by SAM stack. The SAM stack expects your git repo to contain a buildspec-pr.yaml file and a buildspec-sync.yaml file. Outside of that constraint, you are free to do what you want within your Git repos so long as the templates in the repo get synced to the S3 bucket under the templates directory in S3 and the stacks get synced to the S3 bucket under the stacks directory in S3. We have included a demo repo that you can clone and use for yourself that contains the minimum scripts needed for this process to work.

CloudGenesis Deployer Setup

CloudGenesis itself is a simple launch — only one SAM stack to package and deploy. However, it does take a lot of parameters as this product leaves the security of what it can operate on up to you! Let’s cover those parameters in some detail:

Below is a sample deployer role and service role combined in a single CloudFormation stack. This demos how the deployer role is least privilege to only work with stacks by certain naming conventions and only the permissions on CloudFormation that CloudGenesis actually needs. On the other hand, the service role has expanded permissions to manage the full lifecycle of certain cloud resources that the deployer role did not have access to do on its own.

Enjoy!

CloudGenesis is rapidly changing the way we think about and manage our AWS infrastructure at LifeWay. Let us know if you find this useful by giving us a shout out on our twitter account https://twitter.com/lifewaytech! We’re interested to see how others might use it and what ideas you may have to make it even better!

--

--