Continuous Compliance in the Cloud — Part 1

Published in

Airwalk Reply

7 min readSep 14, 2021

This post is the first of a series of posts about the importance of cloud security posture management, and the need for automation and codification of controls in cloud security.

“Through 2025, more than 99% of cloud breaches will have a root cause of preventable misconfigurations or mistakes by end users” — Gartner

We review the security problem areas that fall under the customer’s responsibility in a cloud setting, and we’ll show you what good looks like.

Cloud Security

Let’s start with cloud security, and a concept that I think has been the cornerstone of reasoning about security in and security of the cloud — the shared responsibility model. It doesn’t always correctly reflect the division of responsibility, but it splits the responsibilities of the customer and the Cloud Service Provider, in this case AWS, for any given system.

AWS Shared Responsibility Model — Platform, Application and IAM is our focus for this post.

It’s a useful starting point to frame the dialog on who is responsible and for what. The bottom half of this diagram showing items like hardware is where the AWS responsibilities lie. The top part is the responsibility of the customer. In this post, we’ll focus on the platform, application and IAM configuration and management.

Every service that you can modify in any way is part of the Customer’s responsibility. For example, the resulting configuration of your Security Groups, the access control on your storage bucket, the permissions on your federated role.

Although these are otherwise perceived as managed services, their security and configuration is the responsibility of the customer. And we know from experience that this leads to problems.

Problem Areas

We can break these problems in three core areas: Complexity, Point In Time, and Stagnation. Starting with complexity. We’re always thinking about the configuration of VMs and storage buckets, are they open to the public by default, what are the security groups like?

But Cloud Service Providers have hundreds of services. Do we know if these services are compliant and secure by default? Often, it’s a matter of convenience versus complexity in building something versus making sure that you have the most secure configuration of it. Equally important is to be able to configure and build on top of these services in a consistent way, across our workloads.

Our next problem area is Point in Time. So, when we build a service, we treat its existing configuration as compliant, without taking into account any future changes either from our or the Cloud Service Provider’s side. But we know that point in time visibility and compliance is not good enough. We know that exploitation of certain misconfigurations is largely automated, for example bucket-stream .

And finally, stagnation. Your product delivery will need to innovate, try new things, and have quick feedback cycles about what is compliant and secure and what isn’t. Security shouldn’t be in the way of your deployment frequency or lead time for change. If you are already using DORA metrics, you can measure this in terms of both the stability and velocity of your delivery. In summary, our problem areas are :

1. Lack of consistency of configuration of services.

2. Point in time checks lack of continuous visibility of risks and compliance.

3. Stagnation of business enablement and product delivery.

Complexity

First core problem area — complexity. Figure 6. from the SANS 2021 Cloud Security Survey

Almost 50% of successful attacks are (at least in part) the result of a misconfiguration of cloud services and/or resources. Arguably this is due to the inherent complexity of certain cloud services. We know that misconfigurations have been the main reason behind some well publicised incidents across the industry. Building secure services throughout hundreds of accounts and workloads, consistently is still proving to be a challenge. You might say that a large amount of cloud native services come with secure by default configurations, and that might be true to an extent. Can you make sure that a secure configuration is going to be in place, throughout the life of the service and not just when it was first built?

Point In Time

Second problem area — Point in Time

This figure is from the SANS 2020 Cloud Incident Response Survey

Compromise to detection still tends to be weighted toward days and months. Detection to containment is on average significantly faster, 50% of the efforts take place in less than 5 hours.

However, exploitation of public facing misconfigured cloud service provider services — with known vulnerabilities — is largely automated nowadays. A new S3 bucket, will be scanned for misconfigurations and attempted to be exploited, almost immediately after its creation. The same concept applies to networking, IAM, and cloud service specific controls. As an example, performing your own scheduled scan of public buckets every hour, or checking if the security groups and NACLs are still in place, means that you might be up to an hour late on detecting the misconfiguration or breach.

The same applies for a lot of cases of lateral movement from within the environment as well.

Stagnation

And finally, the third problem area — stagnation of your product delivery. Let’s assume that this is what your development lifecycle looks like. You have a security review, with a signoff — a CAB meeting or a review stage by a security team — that will allow you to then deploy to production.

And you might not think much of it, but the fact that if the review doesn’t go well, you now have to go two steps back to development again, hoping that this time your service is going to be good enough to pass the review. We know that quick iterative processes provide the best value and this isn’t one of them.

This does not help your innovation or your product delivery.

Cloud Security Posture Management

Cloud Security Posture Management or CSPM — is a continuous process of cloud security improvement and adaptation to reduce the likelihood of a successful attack.

Imagine now that we have an automated process that can detect, notify and even act on misconfigurations and events. Cloud service providers are built on their APIs. We have the capability to query configuration, and get events based on changes. For example, a storage bucket configuration allowed objects to become public, or a user has added an overtly permissive policy. We can either scan the config of each resource, or even better get the resulting API call of the event. We have the logs from the APIs, we can get the near real-time notification of the event. We can make informed decisions about whether to notify, raise a ticket or automatically remediate. In other words, we can create guardrails and configure them accordingly.

As a side note I’m using this type of example here for two reasons. One it makes easier to communicate to all levels of cloud practitioners, and two these are the textbook type of examples, that have associated real-life compromises based on these misconfigurations in the past. This doesn’t imply that CSPM is limited to these textbook capabilities. If anything, the limitations of this approach are closer to what we can detect on a cloud-native basis.

Cloud Security Posture Management to the rescue

This is how a CSPM would address these problems:

CSPM turns the complexity of hundreds of services and configurations to a consistent set of either a good configuration or a known bad detectable state.

And not point in time, but as it happens or as often as we can process. Exploitation is automated and so should remediation where possible and appropriate.

Tighter feedback loops allow for a far more agile and “no big surprises” development and innovation capability. Be upfront about what can and cannot be done with immediate actionable feedback.

Here’s a side-by-side comparison to what our delivery would look like, without and with CSPM.

Instead of having a single point to close our feedback loop, we have multiple points throughout our delivery process. We prevent known bad configs, we detect risky events, we make risk-based exemptions on known cases, and we outright respond and correct. At each one of these points, we can close the feedback loop, and provide actionable feedback to the development and security teams.

To summarise, a CSPM aims to provide consistency, continuous automation and overall, aid your product delivery.

At AirWalk we’ve used these principles to build our continuous compliance framework pillars. Our approach and framework are at the core of our Financial Services AWS solution.

Continuous Compliance Framework is a comprehensive, automated approach to codifying policies and automatically remediating violations in the cloud. For those familiar with the terminology, Continuous Compliance is generally positioned in the CSPM area.

In the next post on our continuous compliance series, we’re going to take a deeper dive on what CCF is, and how it can help your organisation’s cloud security posture.

Note: This post was derived from a Reply xchange conference talk, Jim Lamb and I gave in spring 2021