Securing your Software Supply Chain with in-toto

Paul Jones
Nov 19 · 6 min read

Introducing the Supply Chain

Last year’s SolarWinds hack caused the industry to think more deeply about the security of its software supply chains.

Of late, there has been some movement in the space:

  • In May 2021, the CNCF published a great whitepaper on Software Supply Chain Best Practices
  • In September 2021, the first release of SLSASupply-chain Levels for Software Artifacts — was published. SLSA describes 4 increasingly stringent sets of requirements (“levels”) to achieve a secure supply chain

Both SLSA and the whitepaper recommend in-toto which, in its own words, is a framework to secure the integrity of software supply chains.

That sounds useful, but how does it help?

Why do need in-toto?

Without effort to secure it, a supply chain has a large attack surface. This image, from SLSA.dev, highlights some of the potential vectors:

See slsa.dev for more information

in-toto aims to provide a mechanism for preventing an attacker from tampering with the outputs of stages in a supply chain — typically a CI/CD pipeline. When used in conjunction with other techniques described in the CNCF whitepaper and SLSA levels, it can significantly reduce an attacker’s opportunities.

Here, we will demonstrate a solution to vector C or perhaps F. How would we notice if an attacker modified code after review, whilst it was being packaged?

An example project

To demonstrate in-toto we have:

  • Bootstrapped a trivial Spring Boot project (wth Spring Initializr)
  • Chosen to use the Maven build system
  • Installed the in-toto SDK and CLI with pip install in-toto
  • Created public/private key pairs to sign our in-toto data (we can do this with in-toto-keygen or with existing GPG keys used for e.g. code signing)

We need very little code to demonstrate the value of in-toto. Consider a very simple REST controller that we will soon “hack” and see if we can get away undetected:

@RestController
public class MyController {
@GetMapping(value = "/")
public String getRootResponse() {
return "hello world";
}
}

Let’s also keep our supply chain simple for now. Imagine we are:

  1. Building an executable jar with mvn package
  2. Validating the jar’s authenticity before allowing the workflow to continue (e.g. before running integration tests or creating an OCI image)

The in-toto Layout

The framework has to understand our intentions. It needs to know:

  1. The Steps in our workflow, including the commands that must be executed
  2. The inputs to each Step, known as Materials
  3. The outputs from each Step, known as Products
  4. The actors authorized to perform each of these steps

This information is encoded in a JSON file called a layout. It can either be crafted by hand or generated using Python or one of the other language bindings in-toto supports.

The layout for the above (abridged for now) could look like this:

root.layout:

"steps": [
{
"_type": "step",
"name": "package",
"expected_command": [
"mvnw", "package"
],
"expected_materials": [],
"expected_products": [
[ "CREATE", "demo-0.0.1-SNAPSHOT.jar"],
[ "DISALLOW", "*" ]
],
"pubkeys": [
"776a00e29f3559e0141..."
],
"threshold": 1
}
]
  • With expected_command, we’re stating that mvnw package (exactly) must be used
  • With expected_products, we’re stating (using in-toto’s rule lanaguage) that we’re expecting the creation of demo-0.0.1-SNAPSHOT.jar and nothing else
  • With pubkeys, we’re defining the identity of the authorised actors (in-toto calls them “functionaries”) that are allowed to perform this step

Critically, the layout is signed by someone trusted. This signee should be someone who is considered to “own” the pipeline — not necessarily the same functionary that is running it (likely to be a CI/CD system).

Creating the layout is, of course, not enough. We need a way of capturing who has performed the step, how they did it, and what the output was. To do this, in-toto requires us to wrap all of our commands with in-toto-run, like this:

$ in-toto-run \
--step-name package \
--products demo-0.0.1-SNAPSHOT.jar \
--key mykey \
-- mvnw package

Here, too, we provide a key so that we can subsequently verify that the step was run by an authorized functionary.

Running this wrapped command generates both the jar, as we’d expect, and some additional in-toto metadata in the form of a .link file - which is more JSON (abridged below for clarity). The filename indicates the name of the step (package) and the functionary that performed it (identified by their key - 776a00e2 here):

package.776a00e2.link:

"signed": {
"_type": "link",
"command": [ "mvnw", "package" ],
"environment": {},
"materials": {},
"name": "package",
"products": {
"demo-0.0.1-SNAPSHOT.jar": {
"sha256": "e201e6c6eb05b54a0c9325a62114f93faa8158f9325331d1f80e126f8841d985"
}
}
}

We now have cryptographically signed metadata that:

  • Confirms the command we used to generate our products
  • Gives us a crytographic hash of the products generated by the stage, aiding in preventing tampering

Each step in our pipeline will generate a .link metadata file.

We will want to use this metadata to verify the integrity of our pipeline (see the next section) — but we may also wish to publish it for audit or other later use. Grafeas — an API designed to host supply chain metadata — supports in-toto attestations. Grafeas can be used in conjunction with — for example — a Kubernetes admission controller to ensure that containers entering a cluster have trusted provenance.

Verifying our Supply Chain

We have our jar, and we have a single signed .link attestation of its provenance (given we had just one pipeline step).

To verify it, we use in-toto-verify. What we want to verify is described by an additional section in our layout:

root.layout:

"inspect": [{
"_type": "inspection",
"name": "inspect",
"expected_materials": [
[ "MATCH", "demo-0.0.1-SNAPSHOT.jar", "WITH", "PRODUCTS", "FROM", "package" ],
[ "ALLOW", "mykey.pub" ],
[ "ALLOW", "root.layout" ],
[ "DISALLOW", "*" ]
],
"expected_products": [
[ "ALLOW", "*" ]
]
}]

In our verification stage, we:

  • Expect to have exactly the same jar (MATCH) that we generated in our package stage (again, this uses in-toto’s rules language)
  • Don’t mind (ALLOW) if we also have mykey.pub and root.layout files
  • Don’t want anything else (DISALLOW *) to be accidentally or maliciously carried forward

To check that these rules match, we use in-toto-verify:

$ in-toto-verify \
--layout root.layout \
--layout-key mykey.pub
The software product passed all verification.

Verifying our Supply Chain

To check that we’re not getting a false positive, let’s alter our source code and try to circumvent in-toto.

Imagine that an attacker has been able to compromise our build system and is able to mutate source code after it has been reviewed:

@RestController
public class MyController {
@GetMapping(value = "/")
public String getRootResponse() {
return System.getenv("MYSQL_ROOT_PASSWORD");
}
}

As part of the attack, they try to build the jar, hoping it will be picked up by the rest of our pipeline:

mvnw package

In our verification stage, when we run in-toto-verify, in-toto prevents us from proceeding:

$ in-toto-verify \
--layout root.layout \
--layout-key mykey.pub

(in-toto-verify) RuleVerificationError: 'DISALLOW *' matched the following artifacts: ['"demo-0.0.1-SNAPSHOT.jar"']

An End-To-End Solution

In practice, a software supply chain is likely to be made up of multiple pipelines, each with multiple steps that generate multiple .link files. You may wish to verify integrity at the end (before publication of a finished product) or after individual steps, depending on your threat assessment.

In Secure Publication of Datadog Agent Integrations with TUF and in-toto Datadog describe how newer versions of their agent transparently verify in-toto metadata before installation.

As part of a complete solution (to ensure the authenticity of commits, to validate the provenance of any dependencies, to scan for malicious code, to check for vulnerable packages prior to deployment, and more), in-toto is a valuable tool.

Citihub Digital, a Synechron Company

Recording the digital DNA of financial services.