Everything’s a Supply Chain — Securing the Delivery of Infrastructure in the Cloud
There has been a lot of dialogue concerning “supply chain attacks” recently, especially after the SolarWinds incident thrust it to the forefront. When “supply chains” are discussed, most analysis tends to focus on that of the software supply chain — build systems, dependencies, libraries, and other components of the software package that can lead to unintended code execution.
In fact, this is what is believed to have been part of what was at play for SolarWinds; an unexpected piece of code was added to the software early enough in the build process that the final binary was still signed by SolarWinds itself.
But in cloud-based software delivery models, the supply chain encompasses not only the delivery of software, but delivery of the surrounding infrastructure components as well. Consider a modern cloud-based SaaS application. It may have tens, or even hundreds of moving pieces that are each responsible for delivering the complete infrastructure solution: the software build components, shared or imported instance images, infrastructure as code templates, storage buckets, and scores of other proprietary cloud services that combine to deliver the application and its underlying infrastructure to end users.
Much has been written about popular software supply chain attacks, so I will not cover that here. Instead, I’d like to focus on some less often discussed, but still very much important, pieces of the supply chain at the cloud infrastructure level.
Infrastructure as Code
Before we even begin to consider how our application code will be deployed, the infrastructure itself must be configured. Some organizations do this using the “point and click” model inside the cloud provider dashboard, but the oft-recommended process is to use infrastructure as code (IaC) — CloudFormation, Terraform, ARM templates, Pulumi, or other tools that express the configuration of underlying compute, storage, and other infrastructure resources in a consistent manner via code.
Consider how critical this step is to the security of your application environment and how many potential places there are for an attacker to make modifications to how this infrastructure is deployed.
In some cases, IaC templates are developed from scratch. More likely, a “quick start” or example template is copied from StackOverflow, GitHub, cloud provider documentation, or somewhere else and then modified by importing pieces from various sources until a complete template is developed. This presents numerous areas of concern around how these templates are vetted and reviewed prior to being deployed.
An attacker could simply publish a popular IaC template for a common use case (e.g. deploying WordPress on AWS EC2), add an inconspicuous change, and wait for someone to use it. While some changes, such as embedding new resources, or trusting third-party accounts to use IAM roles, may be more easily detected, others, such as including a small step in the user data section of an EC2 instance that downloads malware, may not be.
These templates then need to be saved, uploaded, and (ideally) version controlled. This introduces a whole new category of risks around the security of that solution, who has access to it, how modifications are monitored, and the lifecycle of templates as they move from the developers’ laptops to the shared repository to a build system and eventually to the cloud provider.
Once the template reaches the cloud provider, yet another storage location is introduced. In AWS, S3 buckets are commonly used to store CloudFormation templates because S3 links are the only ones that can be used directly via CloudFormation (aside from uploading the JSON or YAML files directly).
Properly securing S3 is beyond the scope of this post, but suffice it to say, there are a considerable number of things that should be done to secure the bucket including: enabling access logging and versioning, ensuring the bucket isn’t public, and, depending on your risk tolerance, enabling CloudTrail data events for the bucket to track changes to the files within it. Ultimately, the security of your entire infrastructure is only as strong as this bucket’s security since anyone who is able to modify the templates can inject unexpected changes into the final deployment.
For those keeping score, that’s at least three potential areas of weakness, and we haven’t even deployed the infrastructure yet! Below are some questions to consider when evaluating the security of your IaC process.
- How are infrastructure as code templates reviewed for malicious (or accidental) misconfigurations before being deployed to production?
- How are IaC snippets managed at your organization? Are they copy/pasted from unvetted sources, or is there a secure repository approved for internal use?
- If IaC templates are stored in a repository, how is that repository monitored for unintended access? Which users have access to push changes?
- If a change was made to an IaC repository, would any alerts be sent, especially if that change was made by a new user or one outside the organization?
- How are the IaC templates copied to the cloud provider during the deployment phase?
- Is TLS used to upload all templates during the provisioning phase? If the templates are stored in S3, is a bucket policy configured to enforce TLS uploads?
- What monitoring is in place to detect unauthorized changes to cloud provider storage buckets used to store templates?
- If an attacker modified a CloudFormation template after it was uploaded to S3 but before it was imported by CloudFormation, how would you know?
- Are processes in place to detect changes to CloudFormation parameters or config after a stack has been deployed?
I once spoke to a company who went to great lengths describing the security of their cloud deployment environments, explaining how development, staging, and production were each provisioned in isolated “clean room” AWS accounts with their own access controls, monitoring, and security tooling. Everything was deployed using infrastructure as code templates and only certain approved developers could access production.
As I started to dig into how this infrastructure was deployed, the line of questioning went a bit like this:
Me: “This sounds great, but what deploys those infrastructure as code templates?”
Them: “It’s all hooked up to Jenkins! Everything is automated.”
Me: “Hm. And how does Jenkins do the deployment across environments exactly?”
Them: “Well, it makes calls to the CloudFormation APIs in each account.”
Me: “Yes, but what separates this development, staging, and production access in Jenkins?”
Them: “Each pipeline step uses its own access key, one for each environment. And those are stored as Jenkins secrets.”
Me: “And where does this Jenkins machine live?”
Them: “…in the development account.”
There are many versions of this implementation, but ultimately if you have one build machine deploying to multiple environments, the security of those environments is only as strong as the security of the build machine’s environment. In other words, all the great efforts this company took to isolate their production AWS account were largely voided by sharing access across accounts from the development environment. And unfortunately, due to their responsibilities, build systems are some of the most privileged components of a cloud environment.
If you are using a managed build system, such as AWS CodeBuild, there are additional security challenges to keep in mind. For example, CodeBuild allows you to define a container image in which the build will be run. This image, like any build system environment, must also be securely built, stored, and delivered to CodeBuild (see “Container Repositories” below). Talk about inception!
Some questions to consider when evaluating the supply chain impact of your build systems:
- Where are build and deployment machines hosted (e.g. cloud vs. on-premise) and, if in the cloud, in which cloud accounts?
- How is access to these build machines controlled? Is there an approval process before deployments can be triggered?
- Do build machines span environments? Is the same machine deploying to development, staging, and production? If so, how can the blast radius be limited?
- How are secrets managed within the build system? Can a developer submit a build that runs “echo $AWS_ACCESS_KEY_ID” in the build script?
- Many build systems are given admin access. Can this be scoped further to just approved services?
If a company ever says “we’re serverless so we don’t need to think about the security of that part of the application,” run far away and buy puts on their stock.
To be fair, serverless technologies like AWS Lambda do abstract away many common security concerns: operating system security, patching, logging configuration, isolated runtimes, and other operational processes. However, from a supply chain perspective, there are still many risks of malicious exposure.
Although this article is focused more on the infrastructure aspects of the supply chain, I would remiss if I didn’t mention that serverless functions, like any other application, still need to be audited for software vulnerabilities and injections. AWS Lambda doesn’t protect you if someone compromises an upstream NPM module and your build system diligently packages it into the final function.
In AWS Lambda, a new feature called “Layers” introduces yet another risk for supply chain attacks. With Layers, a function can import and access code from other locations, including other AWS accounts, that is not packaged into the function itself at build time. In other words, the code seen by your build system (where security automation is typically run) may not be the final package available at runtime. When using layers from your own account, it’s important to ensure the build process for the layers is following the same security auditing process as the core codebase. If your team audits every third-party dependency, but you trust a layer from another team that doesn’t have the same security posture, you are at risk.
If your function package moves from a build system (e.g., Jenkins) to a file store (e.g., S3) before being passed to the final runtime location (e.g., Lambda), it is crucial to ensure that it is not modified at any point. This deployment process raises many of the same concerns discussed above in the “Infrastructure as Code” section — how the bucket is secured and how access is monitored, for example.
Fortunately, AWS has recently added a new feature for Lambda called “Code Signing” that ensures only trusted, unmodified code is run from Lambda. Enabling this feature is part of a comprehensive strategy to prevent tampering with the source code of serverless packages.
It’s important to remember that serverless technologies are still software. The supply chain attacks may utilize different attack models, but the end results are still the same. Consider the questions below when evaluating your serverless environments:
- How are your serverless functions built? Does a pipeline run “npm install” to pull dependencies from a public location? Do you trust these locations?
- Do your functions load any external libraries or code at runtime?
- What happens to your code between the time it is packaged and the time it is uploaded to a serverless environment? For example: does your build system package Lambda functions as a ZIP file and save them in an S3 bucket before calling “aws lambda update-function-code --s3-bucket=…”? Who has access to that bucket?
- Have you enabled code signing to ensure the integrity of the packaged codebase and the final package used by Lambda?
Shared Instance Images
The security of public images, such as AWS AMIs, has been a source of growing concern, prompting quite a bit of research into the types of malware and malicious files that permeate the public image marketplace. Recently, it was discovered that a number of publicly-available AMIs contained cryptomining malware. The fact that AWS does not actively moderate community AMIs make using them a risky bet for anyone deploying instances based on these AMIs into their accounts.
What makes these AMIs an even greater risk is the process of chaining and creating new AMIs. For example, if a developer launches an instance based on a malicious community AMI, customizes it a bit, and then saves a new snapshot based on the modified instance, that snapshot can easily be used in the future with little warning that its parent image contained malicious code.
I won’t discuss this issue in as much detail because it is fairly well-known. That being said, consider the following when evaluating the security of shared images in your accounts:
- How are AMIs and base images created and shared across your accounts? What security controls are in place to avoid using insecure community images?
- Are malware and similar security scans run on all host images prior to their use in live environments?
- Are controls in place to prevent “chaining” of malicious AMIs?
- If a “build factory” process is used to create trusted AMIs, how is this process secured? Who has access to the build environment or the ability to add dependencies at this stage?
Container image dependencies are really just another take on the AMI problem described above. Cloud native container services like AWS ECR, Azure ACR, and GCP GCR introduce additional complexity into the supply chain, but ultimately the same principles apply.
From a security perspective, the goal should be to ensure that all software deployed into live container environments has passed through a security audit for malicious code. Implementing that requirement from an infrastructure level can be challenging in practice due to the sheer number of moving pieces.
Starting with the repository itself, the ability to push, overwrite, or modify images should be restricted to highly-trusted systems. It doesn’t matter how secure your build pipeline is, or how many security scans you run along the way, if a compromised user account can simply run “docker push” and overwrite an image used in production.
Some questions to consider:
- Are permissions and policies configured to prevent untrusted users or systems from pushing container images into registries? What process is in place to ensure only trusted systems can push?
- Is security auditing enabled both in the build pipeline and in the registry to ensure that malicious or untrusted code is not deployed? Are services like AWS ECR’s security scanning enabled?
- What prevents an image from being overwritten? Is tag immutability enabled where possible?
- Do all task definitions across self-managed (e.g., Kubernetes clusters) and managed services (e.g., Fargate, ECS) specify specific tags and avoid the use of the “latest” tag?
- Which users or systems can update these task definitions? Are processes in place to prevent unintended modification outside of the build pipeline?
Strictly speaking, the “supply chain” refers to everything that sits between developers writing code and end users consuming it. In on-premise software delivery models, this chain tended to end once the software was delivered; the code was shipped, the user installed it, and the chain was complete. But in SaaS and cloud-based delivery models, the supply chain is never-ending; the end user is constantly consuming software that is updated on a continuous basis.
Because of this, the cloud “supply chain” is more of a “supply cycle,” and security model must adapt to reflect this. Consider a SaaS application that delivers static assets via an S3 bucket and CloudFront distribution to an end user accessing a site via a web browser. Now consider all of the potential points of compromise along that route:
- The assets — how they are built, compiled, and uploaded.
- The S3 bucket — how it is deployed (see “Infrastructure as Code” above), updated, and configured.
- The CloudFront distribution — how it is deployed, updated, and configured, how new origins are added, whether Lambda@Edge is used (and if so, how those functions are deployed, updated, and configured), and so on.
- The Route53 records — how they are created, updated, and configured.
This is a simple example, but it highlights how the supply cycle grows exponentially with the use of each new cloud service. The cycle is no longer confined to just the development, build, and delivery processes, but rather an ongoing cycle of deployments, configuration, and updates and multiple supporting services.
Hopefully this post makes it clear how interconnected cloud environments are, and how the security of each component is crucial to the security of the overall delivery model. As cloud environments continue to grow in complexity, I suspect that we will see even more sophisticated supply chain attacks that chip away at the weakest links, resulting in the smallest vulnerabilities escalating to complete compromises.