Secure Software Supply Chain (S3C) in Serverless world

Published in

Google Cloud - Community

10 min readNov 8, 2021

Binary Authorization with Cloud Run

tl;dr (before jumping to code)

Building a secure software supply chain is a relatively new topic. New compliance and regulations requirements will challenge organizations to think differently about Software Security. Many recent hacks have shown us time and again that traditional methods of simply keeping code in private repositories, doing regular repository scans and runtime scans will not be sufficient in the future.

This blog tries to define what Secure Software Supply Chain is, why it is important, what are the key challenges in building it and also, demonstrates a no-ops method to build a S3C using Cloud Run, Cloud Build and Binary Authorization in Google Cloud.

Source Code Repository

Let’s Start…

Security is a major pillar of a Cloud Adoption Framework and Operating Model you pick from any Cloud Provider. It is definitely an Important pillar at Google Cloud. Adopting cloud at scale with all security measures and compliance controls in place can be a daunting task for an organization. One of the primary reasons for this is that modern threats hit at many layers of the “stack”. When I say stack it comprises all the components involved in serving your users and/or applications.

Modern software architecture and the software delivery is also getting very complex with several tools added in the develop, build and release chain. Many times security tools and policies are introduced as external entities in this chain.

As we all know that Security is a very deep and wide topic, and we can not possibly address every security aspect in a single article (or even a book), so let’s focus on this Secure Software Supply Chain aka S3C here and explore it in detail.

What is Secure Software Supply Chain aka S3C

So first let’s try to define the Software Supply Chain and some of the vulnerabilities that can be introduced through it.

At a high level, probably we all know how food supply chain works. Take one example; eggs. Eggs are produced in large poultry farms and before coming to your breakfast plate, they go through a journey. Picture below tries to show that journey.

Code has a similar journey. It has to go through several steps before it starts serving traffic in production. While everyone’s build-release pipelines are different, this picture shows some of the common steps used in software release pipelines.

Now, there is a fundamental security gap here. What is making sure that only an “Approved” and “Intended” software can be deployed in the targeted environment? In the egg supply chain example; how will you make sure that some malicious actor is not putting “Bad Eggs” on the grocery store shelf and you end up buying those bad eggs? As you can guess, one bad egg on the shelf can have a severe health impact on this grocery chain’s customers. Same way what is preventing a malicious actor (internal or external) to inject a bad dependency in the software supply chain that can cause major security issues for an organization?

Many such security gaps pop-up very frequently these days in some major software platforms and impact organizations of any size. The impacts of broken software supply chains are such that in recent “Executive Order on Improving the Nation’s Cybersecurity” federal government put a specific section (Section 4) about “Enhancing Software Supply Chain Security”

Section 4.e.3 specifically calls out to employ automated tools, or comparable processes, to maintain trusted source code supply chains, thereby ensuring the integrity of the code.

This is a fundamental change in how we traditionally thought about software security, mostly as an afterthought in software delivery.

Another major challenge that is introduced by this is the introduction of many tools in the software supply chain + operations of those tools + effort to integrate those tools. For a software developer and for a business; it is an added overhead, since software developers want to just develop the code and business wants to use it as quickly as possible. So it’s challenging to keep the right balance.

So the fundamental questions for an organization here are:

How to build a S3C, with a policy (as a Code) driving the decision to deploy or not deploy a code in runtime?
How to make sure that Software is built and shipped as quickly as possible without too much friction?
Is new operational overhead introduced by the new tools to build such a S3C? How to stay as close as Serverless?
How does S3C work when your software/code hosting platform is Serverless (like Cloud Run)?

Few solutions to build a S3C

One answer may be to put some manual checks and validations before deployment. Well, agree, but we all know with rapid development and frequent deployments, it is nearly impossible to do such validations manually without impacting the velocity. This is not a good answer to question#2.
Another option may be to introduce 3rd party tools that validate and attest your software before going to runtime. Generally these tools will bring in a lot of operational overhead. For example, you need to install these tools somewhere, upgrade them with new versions, monitor them to make sure they are up and running, and many more.. This is not a good answer to question#3 above.

Is there a better solution? Here comes Binary Authorization for rescue.

What is Binary Authorization and Architecture

So, now let’s explore what Binary Authorization is in detail. Binary Authorization is a managed service in Google Cloud that helps build deploy-time security controls to ensure only trusted/approved container images can be deployed in the runtime environments like Google Kubernetes Engine (GKE) or Cloud Run.

Being a fully managed service, to adopt Binary Authorization you do not have to stand up any infrastructure or build an operations team. Just enable the Binary Authorization API in your project and you are ready to go. This solves for question#3.

There are 4 major components to S3C using Binary Authorization:

A Policy — A set of rules defined as Code that governs the deployment of container images. Rules in a policy provide specific criteria that an image must satisfy before it can be deployed. If policy requires attestations before deployment, you must also set up attestors that can verify attestations before allowing associated images to deploy.
KMS — Used in generating public and private keys that are used in signing and validating images.
Signer — Signs the unique image descriptor (image digest) with a private key generated thru KMS. This produces a “signature” which is later (at the time of deployment) verified by Binary Authorization to make sure that a signed image is deployed.
Attestor — has a public key that corresponds to the private key used by a signer to sign the container image digest and create attestation.

Binary Authorization uses Policy, Signer and Attestor to validate the image and decide if it is OK to deploy that image in the targeted run time or reject the deployment.

Here is an architecture that shows the working of Binary Authorization.

This Architecture shows how Binary Authorization enforces a policy to make sure that if there is any critical vulnerability exists in the image/code then the image will not get deployed in Cloud Run. In order to accomplish this the image is signed after vulnerability scanning (a built-in feature in Google Container Registry) when no critical vulnerability is found.

There might be other scenarios (mentioned above) where you want to make sure that the image is deployed when it passes QA checks (it is up to you to define what that QA check really means). FYI; in our demo below, QA means no critical vulnerabilities in the image.

In any case, if a malicious actor tries to deploy an image in Cloud Run that is not signed by the defined supply chain then Binary Authorization won’t let it go to.

One of the biggest benefits of this architecture is the clear separation of duties, for example:

Security Team — writes Binary Authorization Policy in YAML. Grant exception if at all needed, again with a change in Policy YAML.
Foundation/Infrastructure Team — uses GCP Project creation process to enforce Binary Authorization Policy. This is done mostly in Terraform and is explained in detail below.
Build & Release Team — creates build and release scripts, may be in Cloud Build using cloudbuild.yaml, to enforce the standards like, image vulnerability scanning, other QA checks, etc.
Developers — can’t change #1, #2 or #3 that will allow a complete focus on shipping code.

Really quick on Serverless

Serverless in itself is a large topic but for this post we are going to focus on 3 major components of Serverless model:

No Ops (consumer is not maintaining any back-end components to use a platform, i.e. no infrastructure provisioning, patching, updating, etc.)
Pay-as-you-go (Only pay for what you use. Do not pay for any idle time)
Elasticity on demand (scale up/down the platform as the demand increase or shrink)

Though Serverless generally get correlated with a code hosting platform like Cloud Run but fundamentally it is way beyond it. Architecture and all services used in the demo are Serveless. Cloud Run, Cloud Build, Cloud KMS, Binary Authorization services, Google Container Registry, Container Scanning API, fall under Serverless umbrella (“No Ops”, “Pay-as-you-go” and “Elasticity on demand”). So, how many servers you’ll provision, maintain and support, if you decide to implement this demo in your environment? ZERO!!!

Demonstration

Phase I — The Setup

Personas — Security and Foundation/Infrastructure Team

Repository

Prerequisites:

Enable Following APIs in your Google Cloud Project:

Binary Authorization API
Cloud Build API
Cloud Key Management Service (KMS) API
Container Analysis API
Cloud Source Repositories API (if you are keeping your source code in there)

2. Setup “Allowed Binary Authorization Policies (Cloud Run)” organization policy in the project as described at https://cloud.google.com/binary-authorization/docs/run/requiring-binauthz-cloud-run

3. Let’s go thru the variables we’re going to use in the code:

Now, let’s walk through the Terraform Code

Step 1. Create a keyring and a Key using Cloud KMS

Step 2. Create a Container Analysis Note, which is the metadata that describes a piece of software. Container Analysis API (An implementation of the Grafeas API) stores, enables querying and retrieval of notes. Assign Cloud Build service account role to retrieve/view the note.

Step 3. Create an attestor that uses the note created in step 2 to attest to container image artifact. We are naming it as “No Vulnerability Attestor” and using the KMS Public keys (created in Step 1) to verify attestations signed by this attestor.

Step 4. Create a policy to be enforced in the project. You can customize this policy in whatever way you want by using Policy Reference guide

Step 5. Assign Cloud Build service account viewer permission of Binary Authorization Attestors and permission to sign and verify.

Step 6. Grant the Cloud Build service account permission to view and attach the note (created above) to container images. I did not see a Terraform module for it, so I called the APIs to do it. Script to call the API is stored in notePermission.sh, which is called by Terraform (just to keep the terraform state of whole process at one place)

notePermission.sh

This is all the code you’d need to setup all components in the project to use Binary Authorization. You can now init, plan, show and apply this terraform using a Cloud Build trigger. Here is a sample cloudbuild.yaml to perform these steps.

One Terraform Apply is done successfully, you can see all the components using the cloud console.

Phase II — The Validation

Persona — Build & Release Team

Repository

Once the setup is done, let’s go to the image verification, attestation and deployment phase.

Goal of this step is

List tags and digests for the specified image
List container analysis data for a given image and see if there is any “Critical Vulnerabilities” identified in it or not:

If no critical vulnerabilities identified then use the attestor created in the phase I to sign that image
If there are critical vulnerabilities identified then do not sign the image and display a GCR link to the place where these vulnerabilities are listed

Phase III — The Deployment

Persona — Developers

If you try to deploy an image that is not signed by the process above then you’ll get an error:

Service update rejected by Binary Authorization policy: Container image '<image path>' is not authorized by policy. Image '<image path>' denied by attestor projects/<project name>/attestors/<attestor name>: No attestations found that were valid and signed by a key trusted by the attestor

Deployment of signed images to Cloud Run will go fine.

Conclusion

Google has internally championed the S3C for a long time and has built several tools for Google Cloud customers that are based upon open source to not only build S3C but make sure that customers are not taking any unnecessary operational overhead. Pieces of these tools are designed to address separation of duties concerns. S3C is/will-be an important aspect of how software is delivered and secured and technology discussed in this post will help the adoption.