Automating cloud governance at scale

Introduction

Skyscanner engineering squads deploy thousands of production changes every day that interact with hundreds of services hosted on AWS. In total, we have over 6,000 CloudFormation stacks across 60 different AWS accounts! To deploy these changes with zero clicks, we have an automated pipeline for building, testing, deploying code, and provisioning new infrastructure — the latter stage often being referred to as Infrastructure as Code (IaC).

Figure 1 — Above — standard IaC flow for CloudFormation templates. The Continuous Deployment step is responsible for invoking CFRipper and gating the provision of the resources (shown by green and red arrows).
Figure 2 — Below — usage of CFRipper outside of IaC. If a developer uses the AWS Console directly to provision CloudFormation, we can still detect this event. We download the template from S3 and retrospectively delete the infrastructure if the template fails.

Statement Condition Evaluator

Initially, resources were predominantly deployed in a single AWS account. IAM roles were created next to the resources and typically did not require any advanced configuration. However, as our production environment matured, condition blocks inside policy statements became increasingly common. Conditions specify the circumstances under which the policy grants permission, and there are many valid patterns where we can control access to resources across several AWS accounts via AWS organizations, VPCs, IPS using conditions.

Figure 3 — example of two condition evaluators and policy enforcement.

Action Expander

Figure 4 — example of expanding an action pattern.
Figure 5 — Above — example of rule using the expanded actions.

Plugin system

Plugins are a common feature of libraries and frameworks, and for a good reason: they allow developers to add functionality in a safe, scalable way. We knew we needed to design this plugin functionality carefully, so we based the design on real-life use cases in Skyscanner. However, we didn’t forget the open-source community! For the implementation, we decided to go with the most battle-tested plugin system in Python, Pluggy. Pluggy is the base framework that is used to enhance pytest capabilities.

Figure 6 — example of setup.py in CFRipper.

Filter System

Figure 7 — example filter for enabling a rule only for critical services, based on the stack metadata.
  • lack of context in the mechanism (for example, the tags on a stack or the configuration of CFRipper at the time it is being invoked),
  • lack of measures we could control in the result (we could only allow a stack to bypass the failure it was triggering).
  • contain the string ‘gdpr’ in the name of the stack,
  • have the ‘service_type’ tag equal to ‘critical’ or ‘public’.
Figure 8 — Above — we can easily make filters dynamic, for example, time-constrained.

Conclusion

With 300+ stars on GitHub, 30+ built-in rules, and continuous healthy maintenance from the community, CFRipper is ready for anyone to test in their environment. We also love external contributions — if you have any rules in mind, feel free to read our contribution guide and submit a pull request or issue on GitHub.

Join Skyscanner

From flights to hotels and car hire, Skyscanner works side-by-side with the biggest names in travel to bring over 100 million users all the options they need to plan and book their perfect trip.

About the authors

Oliver Crawford

Oscar Blanco Castan

--

--

We are the engineers at Skyscanner, the company changing how the world travels. Visit skyscanner.net to see how we walk the talk!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Skyscanner Engineering

We are the engineers at Skyscanner, the company changing how the world travels. Visit skyscanner.net to see how we walk the talk!