Securing your Software Supply Chain with in-toto
Introducing the Supply Chain
Last year’s SolarWinds hack caused the industry to think more deeply about the security of its software supply chains.
Of late, there has been some movement in the space:
- In May 2021, the CNCF published a great whitepaper on Software Supply Chain Best Practices
- In September 2021, the first release of SLSA — Supply-chain Levels for Software Artifacts — was published. SLSA describes 4 increasingly stringent sets of requirements (“levels”) to achieve a secure supply chain
Both SLSA and the whitepaper recommend in-toto
which, in its own words, is a framework to secure the integrity of software supply chains.
That sounds useful, but how does it help?
Why do need in-toto?
Without effort to secure it, a supply chain has a large attack surface. This image, from SLSA.dev, highlights some of the potential vectors:
in-toto aims to provide a mechanism for preventing an attacker from tampering with the outputs of stages in a supply chain — typically a CI/CD pipeline. When used in conjunction with other techniques described in the CNCF whitepaper and SLSA levels, it can significantly reduce an attacker’s opportunities.
Here, we will demonstrate a solution to vector C or perhaps F. How would we notice if an attacker modified code after review, whilst it was being packaged?
An example project
To demonstrate in-toto we have:
- Bootstrapped a trivial Spring Boot project (wth Spring Initializr)
- Chosen to use the Maven build system
- Installed the in-toto SDK and CLI with
pip install in-toto
- Created public/private key pairs to sign our in-toto data (we can do this with
in-toto-keygen
or with existing GPG keys used for e.g. code signing)
We need very little code to demonstrate the value of in-toto. Consider a very simple REST controller that we will soon “hack” and see if we can get away undetected:
@RestController
public class MyController { @GetMapping(value = "/")
public String getRootResponse() { return "hello world";
}
}
Let’s also keep our supply chain simple for now. Imagine we are:
- Building an executable jar with
mvn package
- Validating the jar’s authenticity before allowing the workflow to continue (e.g. before running integration tests or creating an OCI image)
The in-toto Layout
The framework has to understand our intentions. It needs to know:
- The
Steps
in our workflow, including the commands that must be executed - The inputs to each Step, known as
Materials
- The outputs from each Step, known as
Products
- The actors authorized to perform each of these steps
This information is encoded in a JSON file called a layout
. It can either be crafted by hand or generated using Python or one of the other language bindings in-toto supports.
The layout for the above (abridged for now) could look like this:
root.layout:
"steps": [
{
"_type": "step",
"name": "package",
"expected_command": [
"mvnw", "package"
],
"expected_materials": [],
"expected_products": [
[ "CREATE", "demo-0.0.1-SNAPSHOT.jar"],
[ "DISALLOW", "*" ]
],
"pubkeys": [
"776a00e29f3559e0141..."
],
"threshold": 1
}
]
- With
expected_command
, we’re stating thatmvnw package
(exactly) must be used - With
expected_products
, we’re stating (using in-toto’s rule lanaguage) that we’re expecting the creation ofdemo-0.0.1-SNAPSHOT.jar
and nothing else - With
pubkeys
, we’re defining the identity of the authorised actors (in-toto calls them “functionaries”) that are allowed to perform this step
Critically, the layout is signed by someone trusted. This signee should be someone who is considered to “own” the pipeline — not necessarily the same functionary that is running it (likely to be a CI/CD system).
Creating the layout is, of course, not enough. We need a way of capturing who has performed the step, how they did it, and what the output was. To do this, in-toto requires us to wrap all of our commands with in-toto-run
, like this:
$ in-toto-run \
--step-name package \
--products demo-0.0.1-SNAPSHOT.jar \
--key mykey \
-- mvnw package
Here, too, we provide a key so that we can subsequently verify that the step was run by an authorized functionary.
Running this wrapped command generates both the jar, as we’d expect, and some additional in-toto metadata in the form of a .link
file - which is more JSON (abridged below for clarity). The filename indicates the name of the step (package
) and the functionary that performed it (identified by their key - 776a00e2
here):
package.776a00e2.link:
"signed": {
"_type": "link",
"command": [ "mvnw", "package" ],
"environment": {},
"materials": {},
"name": "package",
"products": {
"demo-0.0.1-SNAPSHOT.jar": {
"sha256": "e201e6c6eb05b54a0c9325a62114f93faa8158f9325331d1f80e126f8841d985"
}
}
}
We now have cryptographically signed metadata that:
- Confirms the
command
we used to generate ourproducts
- Gives us a crytographic hash of the products generated by the stage, aiding in preventing tampering
Each step in our pipeline will generate a .link
metadata file.
We will want to use this metadata to verify the integrity of our pipeline (see the next section) — but we may also wish to publish it for audit or other later use. Grafeas — an API designed to host supply chain metadata — supports in-toto attestations. Grafeas can be used in conjunction with — for example — a Kubernetes admission controller to ensure that containers entering a cluster have trusted provenance.
Verifying our Supply Chain
We have our jar, and we have a single signed .link
attestation of its provenance (given we had just one pipeline step).
To verify it, we use in-toto-verify
. What we want to verify is described by an additional section in our layout:
root.layout:
"inspect": [{
"_type": "inspection",
"name": "inspect",
"expected_materials": [
[ "MATCH", "demo-0.0.1-SNAPSHOT.jar", "WITH", "PRODUCTS", "FROM", "package" ],
[ "ALLOW", "mykey.pub" ],
[ "ALLOW", "root.layout" ],
[ "DISALLOW", "*" ]
],
"expected_products": [
[ "ALLOW", "*" ]
]
}]
In our verification stage, we:
- Expect to have exactly the same jar (
MATCH
) that we generated in ourpackage
stage (again, this uses in-toto’s rules language) - Don’t mind (
ALLOW
) if we also havemykey.pub
androot.layout
files - Don’t want anything else (
DISALLOW *
) to be accidentally or maliciously carried forward
To check that these rules match, we use in-toto-verify
:
$ in-toto-verify \
--layout root.layout \
--layout-key mykey.pubThe software product passed all verification.
Verifying our Supply Chain
To check that we’re not getting a false positive, let’s alter our source code and try to circumvent in-toto.
Imagine that an attacker has been able to compromise our build system and is able to mutate source code after it has been reviewed:
@RestController
public class MyController { @GetMapping(value = "/")
public String getRootResponse() { return System.getenv("MYSQL_ROOT_PASSWORD");
}
}
As part of the attack, they try to build the jar, hoping it will be picked up by the rest of our pipeline:
mvnw package
In our verification stage, when we run in-toto-verify
, in-toto prevents us from proceeding:
$ in-toto-verify \
--layout root.layout \
--layout-key mykey.pub
(in-toto-verify) RuleVerificationError: 'DISALLOW *' matched the following artifacts: ['"demo-0.0.1-SNAPSHOT.jar"']
An End-To-End Solution
In practice, a software supply chain is likely to be made up of multiple pipelines, each with multiple steps that generate multiple .link
files. You may wish to verify integrity at the end (before publication of a finished product) or after individual steps, depending on your threat assessment.
In Secure Publication of Datadog Agent Integrations with TUF and in-toto Datadog describe how newer versions of their agent transparently verify in-toto metadata before installation.
As part of a complete solution (to ensure the authenticity of commits, to validate the provenance of any dependencies, to scan for malicious code, to check for vulnerable packages prior to deployment, and more), in-toto is a valuable tool.