Matthew Finifter, Tony Ngo, and Debo Ray, Application Security
Note: This project was originally presented at the 2019 LocoMoco Security Conference and a recording is forthcoming.
The term “provenance” refers to a place of origin or history of ownership. For Uber’s application security team, code provenance is a strategy for ensuring we have a verifiable attestation of the origin of all code running in production. This provides a root of trust as we move forward in defining and enforcing a collection of policies throughout each stage of the software development process.
Traditional Code Checks
Checking for vulnerable and unreliable/buggy code is a standard best practice in application security and covers a small subset of reasons why code needs to go through proper review process before making it into production.
Code reviews present one of the earliest opportunities to check for errors, bugs, and vulnerabilities. This process includes both human reviews and automated diff analysis.
CI testing looks for code regression as well as stable logic and reliability of the diff. Security unit tests can also be added here to provide some additional checks for security vulnerabilities. When code lands in a repo, broader analysis can be done, like checking for CVEs and conducting advanced analysis.
At build time, analysis is run on the built artifacts (e.g. container) to weed out vulnerabilities introduced by insecure configurations or system frameworks. Integration tests give us yet another round of stability and reliability testing. When it’s time to deploy the code, we do a final check of authenticity, integrity, and security attestation of the built release.
As any application security team can tell you, even the most robust review process is not a catch-all for every potential issue, but it helps reduce the likelihood that new code will introduce known vulnerabilities into our production environment.
We use code provenance as an additional layer of defense against scenarios where either malicious and non-malicious code is pushed past traditional checks. Let’s look at some of these potential scenarios.
This scenario involves an engineer who means well, but lacks experience in checking whether their code gets reviewed, is free of lint warnings, or passes tests. They need to get their code out quickly and might accidentally skip a test.
This refers to someone who has legitimate engineering access, but wants to sneak some code into a production build. They may attempt to make it look like someone else submitted the code or try to get around code review requirements. Their goal is to get malicious code into production without raising any red flags.
An example of this type of threat is an outsider who has gained control of an engineer’s laptop. Similar to the malicious insider, they want to use this access to put malicious code into production, ideally in a way that is not detected. Or, they may attempt to directly attack the build and deploy infrastructure.
The key benefits of this approach include:
- “Chain of custody” for all code landing in production releases: the ability to sample any running code and know that it came from a specific authorized build that was constructed from a specific set of commits, each of which has a known and verifiable author with the correct level of access to affect that service or application.
- Enabling response in case anything goes awry: if we find anomalous or malicious code in a production application, our security response team needs to be able to trust and follow the metadata we’ve put in place to understand exactly what happened and how it got there.
- Flexible, enforced policies for what code is allowed to land in production releases: if a policy is in place that prohibits the use of a specific third-party dependency, there should be no way for a developer to get a commit through this process that uses that dependency. We should be able to discover, and trust, a set of attestations of policy adherence.
Software Development at Uber
Each distinct application and service is a special snowflake with its own repository, development tooling, process, and norms.
This image represents the special snowflakes that make up Uber’s engineering environment including 60+ mobile apps and thousands of micro services.
The point is our security team has to be strategic and thoughtful about:
- How to do something general enough that it works for all these different services, applications, and teams?
- How to do something carefully enough that we don’t break production systems or disrupt engineering processes?
In our experience, off-the-shelf solutions often do not work in our environment. Any proposed solution that begins “why don’t you just” is, in our experience, not viable.
One example of this is commit signing, which enables us to verify that the author listed in the commit message is actually the author of the commit.
Git has built-in support for commit signing, using GPG. This is great!
Except, our employees don’t have GPG keys. We use short-lived SSH certificates, which get renewed almost daily.
So our choice is to either solve the key distribution problem for GPG keys, even though we already solved this problem, but with a different — and better, because they are short-lived — set of keys. Or, we could utilize the SSH certificates our users already have, with a custom commit-signing solution.
We opted for the latter.
Uber developers have a verifiable cryptographic identity, via their uSSH key, and our repositories use git. So we built a commit signing solution that all our developers can use, despite all the differences in their special snowflake workflows.
Securing Third-Party Libraries
Using third-party libraries allows us to leverage existing work from the open-source community and enables our teams to focus on building core business logic, instead of trying to create Uber-specific solutions to problems that the industry has already solved.
However, third-party libraries need to be vetted if they will be introduced to our infrastructure. From the repository perspective, ownership of a third-party package is assigned to the author introducing it. The global inventory, as well as the more granular repository-local ownership, is maintained in a standalone microservice to help with the remediation process if an issue appears. This addressed all newly introduced third-party from a future-facing perspective.
A greater challenge lies in dealing with third-party libraries already running in our production infrastructure. Given the large number of microservices, it is critical for the process of surfacing true vulnerabilities to have a high confidence level and not overburden other engineering teams. Therefore, we need to know exactly where a vulnerable function in third-party software exists, and whether it’s being invoked in a vulnerable way, e.g. user input getting passed to it.
While we’re aware of multiple vendors working on this problem space, none of the current solutions in the market were feasible for our environment. Instead, we developed a heuristic to focus on higher severity issues that manifest shallower in the dependency graph. This helps keep the number of false positives reasonably low, as well as keeping the triage process manageable for technical teams.
Chain Enforcements & Beyond
So far we’ve discussed software development up to committing code, but there are a lot more steps after that to fully deploy an app (e.g. building the application, testing the application, and deploying the application). These in between steps open up additional surface area to inject or modify code. So, additional defense layers are required including:
- Set up enforcement along the development process
- Chain the results of these enforcements, starting with the approved commits of code and 3rd party dependencies
We must create a chain of enforcements that root back to our original ssh key we used to sign commits, where every level or layer inherits the results (and therefore the checks) of the level or layer before it.
Another way to look at this is to compare this to the OSI model and layers. The main attributes that make this similar are the fact that each layer has its own scoped responsibilities, only really communicates with the layer above and below it, but it still gets the benefits because of the encapsulation/chaining that’s happening.
- Commit layer — Approved authors and identity.
Valid ssh keys are ingested and used to sign commits to connect an authenticated identity to submitted code.
- Repo layer — Application code security.
This takes commits as input — along with the identity to verify access based on permissions — and performs source code analysis on submitted code.
- Container layer — Built artifact security.
This is where OCI scanners are used to check for vulnerabilities in operating system packages, system frameworks, and configurations. The signature associated to this container can also be used to verify these checks and authenticate the origin of where it was built.
- Host layer — Secure deployment.
Container signatures are verified here, which inherits all that has been encapsulated during previous stages, before app deployment.
After all this, we can make out some similarities and patterns that can bring some homogeneity to our inherently heterogeneous environment, especially when we walk backwards in our software development process.
Deployed applications and services are simply containers of some sort with signatures that need to be verified. By confirming that these containers are built by our build systems, we gain a level of confidence in their security. In turn, these containers consist of repositories made of secure code added through commits — commits that we know come from identified, authenticated, and authorized Uber engineers. This layering approach helps us protect against direct and indirect or transitive attacks while also helping us meet compliance goals around code reviews and access controls. Altogether, this allows us to create code provenance for all the code in Uber’s ecosystem.