Automating cloud governance at scale

Introduction

Skyscanner engineering squads deploy thousands of production changes every day that interact with hundreds of services hosted on AWS. In total, we have over 6,000 CloudFormation stacks across 60 different AWS accounts! To deploy these changes with zero clicks, we have an automated pipeline for building, testing, deploying code, and provisioning new infrastructure — the latter stage often being referred to as Infrastructure as Code (IaC).

But where does Security fit in? Our goal in Platform Security and Automation is to detect and remediate issues early in the development pipeline, minimizing the risk of security misconfigurations in production environments that become harder to track and fix over time. For CloudFormation and IaC, this involves checking templates that define the infrastructure we have in AWS, utilizing our open-source tool CFRipper for the job (we introduced this tool in a 2018 blog post and talked about cloud governance challenges in a second blog post in 2020). Figure 1 and Figure 2 highlight CFRipper’s basic operation.

In this blog post, we will introduce some recent significant improvements to CFRipper that have enabled us to detect issues more accurately, allow for increasing levels of customization, and facilitate dynamic stack exemptions for engineering squads at Skyscanner.

Statement Condition Evaluator

Initially, resources were predominantly deployed in a single AWS account. IAM roles were created next to the resources and typically did not require any advanced configuration. However, as our production environment matured, condition blocks inside policy statements became increasingly common. Conditions specify the circumstances under which the policy grants permission, and there are many valid patterns where we can control access to resources across several AWS accounts via AWS organizations, VPCs, IPS using conditions.

During the adoption of this new approach, developers started creating many cross-account relationships granting access to AWS accounts, VPCs, and organizational units. Our main objective for keeping things distributed was isolation, but allowing developers to create cross-account relationships was against that principle. We needed to put some governance on these conditions and the relationships allowed.

We re-implemented the condition model from scratch. We added support for all fields inside a condition and a new evaluator function. This new evaluator supports the seven data types (string, numeric, date and time, boolean, binary, IP address, Amazon Resource Name (ARN)), all their operators (equals, not equals, less than, greater than etc.), the standalone null check and all modifiers (if exists, for all values, for any value). This results in CFRipper handling more than one hundred different conditions and performing two new categories of rules: Policy Enforcement and Dynamic Policy Evaluation. The former are the rules which enforce a specific condition, whereas the latter can simulate an authorization request to check whether we would approve it or not.

This update has allowed us to provide more intelligent rules — Figure 3 shows a simple test for checking if a VPC would have access to an S3 resource or not.

Action Expander

AWS is an extensive ecosystem that evolves quickly. Keeping up to date with all AWS services and APIs is a titanic task; each week could represent multiple API changes. Currently, there are nearly 11,000 actions on the platform. When working on IAM permissions, developers may choose to use overly broad permissions to sort this problem. This can be achieved using an asterisk (*), representing any combination of characters, and a question mark (?) representing a single character. For example, the pattern s3:* implies 124 actions in total. However, if this is not handled correctly, we might be creating roles with very broad permissions.

Some permissions shouldn’t be available to any service or user in the account. For instance, permissions to delete EC2 instances running the production website or editing IAM policies. To tackle these problems and create granular rules that prohibit using some actions, we implemented new methods to get the permissions. One of them is ‘get_allowed_permissions’ — it returns all allowed actions considering all Effect, Action, and NotAction inside the statements. In Skyscanner, this change has enabled us to create more specific rules, such as in Figure 5, where we prohibit a subset of IAM actions that would allow an attacker to escalate privileges.

Plugin system

Plugins are a common feature of libraries and frameworks, and for a good reason: they allow developers to add functionality in a safe, scalable way. We knew we needed to design this plugin functionality carefully, so we based the design on real-life use cases in Skyscanner. However, we didn’t forget the open-source community! For the implementation, we decided to go with the most battle-tested plugin system in Python, Pluggy. Pluggy is the base framework that is used to enhance pytest capabilities.

CFRipper now automatically loads all rules from any Python package on each execution where the package has an entry point of cfripper (Figure 6). This approach is more manageable as each rule requirements are isolated from the core, improving the development experience, and allowing for plugins to be versioned and tested in isolation. At Skyscanner, we designed this feature to execute a different set of rules in different environments. We are looking forward to more open-source rulesets being created that can be “plugged” into any CFRipper execution an organization might use, without the need for endless configuration changes in CFRipper itself.

Filter System

It is common for specific CloudFormation stacks to require additional privileges in AWS or not be validated against specific rules — for example, admin account roles. In the past, we had an “allowlist” mechanism to create exemptions based on the name of the stack only. This had inherent issues:

  • lack of granularity in the exemption process (allowlist was too permissive at times),
  • lack of context in the mechanism (for example, the tags on a stack or the configuration of CFRipper at the time it is being invoked),
  • lack of measures we could control in the result (we could only allow a stack to bypass the failure it was triggering).

We redesigned this mechanism to use a process we call “filtering”. This allows us to be much more granular and use more context from CFRipper in deciding whether to apply for exemptions or not. When adding a failure or warning, CFRipper will check if there is a filter that matches the current context and set the new risk or mode of the rule. Context depends on each rule and is available inside each rule’s documentation.

A filter can be applied to multiple rules, and they follow a strict precedence structure, allowing filters to override each other if desired. Figure 7 shows how we can enable a new S3 rule in blocking mode for stack templates which either:

  • are being deployed to an AWS account called ‘gdpr-account’,
  • contain the string ‘gdpr’ in the name of the stack,
  • have the ‘service_type’ tag equal to ‘critical’ or ‘public’.

In Skyscanner, this change has enabled us to provide more granular exemptions to stacks at a quicker rate. Filters can be dynamic such as in Figure 8 where we set a time limit for a fix to be applied, and also helpful for our team — we can configure a new rule to only block stacks if they are being created as opposed to stack updates.

Conclusion

With 300+ stars on GitHub, 30+ built-in rules, and continuous healthy maintenance from the community, CFRipper is ready for anyone to test in their environment. We also love external contributions — if you have any rules in mind, feel free to read our contribution guide and submit a pull request or issue on GitHub.

Join Skyscanner

From flights to hotels and car hire, Skyscanner works side-by-side with the biggest names in travel to bring over 100 million users all the options they need to plan and book their perfect trip.

We’re already a market leader, and we’re just getting started. Next stop: Leading the global transformation to modern and sustainable travel.

Join us for the adventure of a lifetime. Together we can change how the world travels. https://www.skyscanner.net/jobs/current-jobs/

About the authors

Oliver Crawford

Oliver is a Senior Software Engineer with four years of experience at Skyscanner in Platform Security and Automation, focusing on tooling in the SDLC and building the best developer experiences possible!

Oscar Blanco Castan

Oscar is a Software Engineer 2 in Skyscanner’s Security Tribe. He’s passionate about security and currently works in the Platform Security and Automation team, developing services to automate the security processes in Skyscanner.

We are the engineers at Skyscanner, the company changing how the world travels. Visit skyscanner.net to see how we walk the talk!