Looking for Bad Apples in Rust Dependency Trees Using GraphQL and Trustfall

Emil Jonathan Eriksson
Volvo Cars Engineering
5 min readAug 14, 2023

--

Five months ago I got the opportunity to join Volvo Cars (VCC) to write my Master’s Thesis for my degree in electrical engineering. At VCC I joined Rust evangelists Nikolaos Korkakakis and Julius Gustavsson with a mission: to figure out how to find scary code in Rust dependency trees.

Rust markets itself as a safer alternative to C, and it provides an ecosystem that makes it very tempting to just add any useful dependency using cargo add. But can you be sure that a crate deep down in your dependency tree is not just a wolf in serde clothing? I set out to find a way for Rust developers to ask questions about their dependency tree.

Not many weeks after starting my thesis, I found a blog post by Predrag Gruevski. His project, Trustfall, was exactly what I needed: It provided a way of running queries across multiple data sources, in Rust! It even came with GraphQL out of the box, a query language I had used previously to build website backends. The only thing I had to do was to find the data sources, connect them to the Trustfall engine, and then somehow explain why my tool was better than just randomly auditing dependencies. 5 months later and I have both a Master’s thesis and an Open-Source tool, cargo-indicate, available on GitHub and via crates.io. I must say that both Trustfall and Rust has been a true pleasure to work with!

At the core of cargo-indicate is a GraphQL schema. Those that have ever interacted with a GraphQL API should feel right at home; write what you want to know and you will get it! Trustfall adds some useful query directives, like filters and recursion, so you get a lot of functionality built-in. In this schema, Package nodes are the backbone. Each Package node comes with several properties aggregated from several tools, and edges leading to other nodes with properties about code stats, Rust unsafe usage, repository information etc.

The cargo-indicate v0.2.0 schema

But where do these properties come from? Well, since Trustfall accept any data source that you can access via Rust, it can be anything! For example, I added Unsafe Rust usage by parsing the output from cargo-geiger, the GitHub API to provide repository information, and code stats are provided by running the tokei crate against source code. For many data sources there already exist excellent crates to access that information, so as a developer I only have to wrap it all up according to the (very few) requirements of Trustfall. You can see an overview of the architecture of the cargo-indicate project below.

An overview of the internals of cargo-indicate

So what can you do with cargo-indicate? Well, the most “Hello, World” example I can think of is to print the name of a package, like so:

But where’s the fun in that? What about finding out which of our dependencies use the semver trick, i.e. which dependencies depend on a future version of themselves? This can be done like so:

Note: Well, here we are limited a bit by Trustfall. It’s currently not possible to provide additional operations to the filter query directive, so we’ll have to be happy with not checking the versions for now.

Ok, that’s all nice, but let’s combine some data sources. What about all dependencies with less than 100 GitHub stars, more than 1000 lines of code in the src/ directory, and an advisory mentioning Windows?

Arguments are prefixed with `$` in the query

Neat!

But what does this data actually tell us? Well that’s harder to say. To figure it out, I let a cloud instance go ham on over 2000 popular Rust packages, in an attempt to find out what my tool could tell us. With around 50 different scalar values to look at, it quickly becomes a bit overwhelming. All features are created equal, some are just more equal than others… Or perhaps not. So let’s look at some interesting findings!

The distribution of the amount of GitHub stars for ~2000 popular Rust projects

For one, GitHub stars are very interesting, because they show us a (log)-bimodal distribution. Normally when you see this type of distribution you can expect that two random processes are happening at the same time. There seems to be support that there is a natural explanation for this: GitHub stars are simply subject to marketing! Perhaps we should be on the lookout for future disinformation campaigns; malicious actors marketing their projects filled with backdoors? Do you give your developers media training? Perhaps you should!

The distribution (or lack thereof) of advisories for the same projects in the RustSec Advisory Database

Another interesting aspect is the lack of advisories for the vast majority of popular packages. I looked exclusively at the RustSec Advisory Database, but most packages had none. It could be the case that most Rust projects are simply excellent. It could also be the case that the only developers that feel the need to report advisories are the ones maintaining very large projects, or care very deeply about their projects. In fact, when @shnatsel tested Rust HTTP clients, he found that most maintainers did not report issues to advisory databases. It didn’t matter that the issues warranted an advisory! You can find his excellent article below.

So, what should you do with this information? Well, I recommend you try cargo-indicate out! It is available now on crates.io, and the only external dependency is to also install cargo-geiger if you want to look at unsafe Rust. Contributions welcome!

Hopefully I have managed to summarize my work well in this short blog post, but for those that are interested I would recommend reading the thesis, available on the Lund University database for student papers LUP here. It provides background information to Rust, but also several nice graphs and descriptions of the internal structure of cargo-indicate for those that want to get into the nitty-gritty details. In it, I go into detail how I managed to group Rust packages into three distinct groups, and found that Windows API bindings are the true outliers. But that’s a story for another post…

If you want to check out my other work, check out my GitHub, and feel free to contact me on LinkedIn here!

--

--

Emil Jonathan Eriksson
Volvo Cars Engineering

M.Sc. in Electrical Engineering, with a specialization in software