Auditing popular Rust crates: how a one-line unsafe has nearly ruined everything

Following the actix-web incident (which is fixed now, at least mostly) I decided to poke other popular Rust libraries and see what comes of it.

The good news is I’ve poked at 6 popular crates now, and I’ve got not a single actually exploitable vulnerability. I am impressed. When I poked popular C libraries a few years ago it quickly ended in tears. The bad news is I’ve found one instance that was not a security vulnerability by sheer luck, plus a whole slew of denial-of-service bugs. And I can’t fix all of them by myself. Read on to find out how I did it, and how you can help!

My workflow was roughly like this:

  1. See if the crate has been fuzzed yet to identify low-hanging fruit.
  2. If it has been fuzzed, check sanity of fuzzing harness.
  3. If something is amiss, fuzz the crate.
  4. In case fuzzing turns up no bugs, eyeball the unsafes and try to check them for memory errors.
  5. If no horrific memory errors turn up, try to replace whatever’s under unsafe with safe code without sacrificing performance.

Turns out Rust community is awesome and not only has excellent integration for all three practical fuzzers along with a quick start guide for each, but also a huge collection of fuzz targets that covers a great deal of popular crates. Ack! Getting low-hanging fruit at step 1 is foiled!

So I’ve started checking whether fuzzing targets were written properly. Specifically, I’ve started looking for stuff that could block fuzzing — like checksums. A lot of formats have them internally, and PNG has not one but two — crc32 in png format and adler32 in deflate. And lo and behold, none of the crates were actually disabling checksums when fuzzing! This means that random input from fuzzer was rejected early (random data does not have a valid checksum in it, duh) and never actually reached the interesting decoding bits. So I’ve opened PRs for disabling checksums during fuzzing in miniz_oxide, png, lodepng-rust, and ogg, and then fuzzed them with checksums disabled. This got me:

inflate crate was the first where fuzzing has turned up nothing at all, so I've started eyeballing its unsafes and trying to rewrite them into safe code. I've added a benchmarking harness and started measuring whether reverting back to safe code hurts performance. cargo bench was too noisy, but I've quickly discovered criterion which got me the precision I needed (did I mention Rust tooling is awesome?). I got lucky - there were two unsafes with two-line safe equivalent commented out, and reverting back to safe code created no measurable performance difference. Apparently the compiler got smarter since that code was written, so I've just reverted back to safe code.

This left just one unsafe with a single line in it. Spot the security vulnerability. I would have missed it if the crate maintainer hadn’t pointed it out. If you can’t, there are hints at the end of this post.

By sheer luck the rest of the crate just so happens to be structured in a way that never passes input parameters that trigger the vulnerability, so it is not really exploitable. Probably. I could not find a way to exploit it, and the crate maintainer assures me it’s fine. Perhaps we just haven’t figured out how to do it yet. After all, almost everything is exploitable if you try hard enough.

Sadly, simply replacing the unsafe .set_len() with .resize() regressed the decompression performance by 10%, so instead I've added an extra check preventing this particular exploit from happening, and then liberally sprinkled the function with asserts that panic on every other way this unsafe could go wrong that I could think of.

Is the function secure now? Well, maybe. Maybe not. Unless we either rewrite it in safe rust (or prove its correctness, which is a lot harder) we will never know.

The thing is, I’m pretty sure it’s possible to rewrite this in safe Rust without performance penalty. I’ve tried some local optimizations briefly, to no avail. Just like with high-level languages, writing fast safe Rust requires staying on the optimizer’s happy paths, and I have not found any documentation or tooling for doing that. https://godbolt.org/ that lets you inspect the LLVM IR as well as assembler and shows what line of Rust turned into what line of assembly, but you can’t feed your entire project to it. (As pointed out in comments, cargo-asm can do that to an entire project). And you also need tools to understand why a certain optimization was not applied by rustc. LLVM flags -Rpass-missed and -Rpass-analysis seem to be capable of doing that, but there is literally no documentation on them in conjunction with Rust.

Discussing the vulnerability further would be spoilerrific (seriously, try to locate it yourself), so I’ll leave further technical discussion until the end of the post. I want to say that I was very satisfied with how the crate maintainer reacted to the potential vulnerability — he seemed to take it seriously and investigated it promptly. Coming from C ecosystem it is refreshing to be taken seriously when you point out those things.

By contrast, nobody seems to care about denial of service vulnerabilities. In the 3 crates I’ve reported such vulnerabilities for, after 3 weeks not a single one was investigated or fixed by maintainers of those crates, or anyone else really. And the DoS bugs are not limited to panics that you can just isolate into another thread and forget about.

After not getting any reaction from crate maintainers for a while I tried fixing those bugs myself, starting with the png crate. In stark contrast to C, it is surprisingly easy to jump into an existing Rust codebase and start hacking on it, even if it does rather involved things like PNG parsing. I've fixed all the panics that fuzzers discovered based on nothing but debug mode backtraces, and I don't even know Rust all that well. Also, this is why there are 4 distinct panics listed for PNG crate: I've fixed one and kept fuzzing until I discovered the next one. lewton probably has many more panics in it, I just didn't got beyond the first one. Sadly, three weeks later my PR is still not merged, reinforcing the theme of "nobody cares about denial of service". And png still has a much nastier DoS bug that cannot be isolated in a thread.

(To be clear, this is not meant as bashing any particular person or team; there may be perfectly valid reasons for why it is so. But this does seem to be the trend throughout the ecosystem, and I needed some examples to illustrate it).

Also, shoutout to tungstenite — it was the only crate that did not exhibit any kinds of bugs when being fuzzed for the first time. Kudos.

Conclusions:

  • Unlike C libraries, Rust crates do not dispense security vulnerabilities when you poke them with a fuzzer for the first time (or sometimes even the third time). Humans make all the same mistakes, but Rust prevents them from turning into exploits. Mostly.
  • Rust tooling is diverse, high-quality and accessible. afl.rs, cargo-fuzz, honggfuzz-rs, sanitizers, criterion, proptest and clippy not only exist, but also come with quickstart guides that makes deploying any of them take less than 15 minutes.
  • Cargo and docs.rs combined with Rust language features that allow expressively encoding application logic make an existing complex codebase surprisingly easy to understand and hack on, making drive-by contributions a breeze. And I don’t even know Rust all that well.
  • Hardly anyone uses #![forbid(unsafe_code)]. Rust offers to rid you of paranoia and arbitrary code execution exploits, but people don't seem to take up on the offer. (Shoutout to lewton developers who did).
  • Safe Rust code can be as fast as one with unsafe (shoutout to serde-json that is the fastest JSON parser in the world, written in fully safe Rust), but squeezing out those last 20% requires you to adjust your code in arcane ways to hit the optimizer happy paths, kinda like with high-level languages. There is no documentation or tooling for doing such a thing, although the building blocks are there. Until such documentation and tooling is created, the only viable option is trial and error.
  • A lot of crates contain 2–3 unsafe blocks that can probably be refactored into safe code without losing performance. This is probably related to the lack of tooling. Rust isolates unsafe code and that makes auditing code easier, but in practice it is not actually audited. We need a libs-blitz-like effort to get rid of such unsafes, I can't process the entire ecosystem alone. (If you also put #![forbid(unsafe_code)] on the cleansed crate, I will love you forever).
  • Fuzzing would not have discovered this vulnerability at all, unless you had a very specific fuzzing setup looking specifically for this kind of thing. Even then, the chances of ever hitting it were pretty darn low. Fuzzing is a very easy way to prove presence of bugs, but it cannot prove their absence.
  • Symbolic execution tools like KLEE or SAW that can be used to prove correctness do not have Rust integration, even though both operate on LLVM IR. KLEE used to have it, but sadly the LLMV version used in KLEE is now grossly outdated. Update: SMACK has been adapted to work with Rust!
  • If you want to write DoS-critical code in Rust and use some existing libraries, you’re out of luck. Nobody cares about denial of service attacks. You can poke popular crates with a fuzzer and get lots of those. When you report them, they do not get fixed. There is a linter to detect potential panics, but if a linter for stuff like stack overflows or unbounded memory allocations exists, I am not aware of it.
  • Rust has no mechanism for propagating security updates through the ecosystem. I was surprised to find that Cargo does not alert you when you’re using an outdated library version with a security vulnerability, and crates.io does not reject uploads of new crates that depend on vulnerable library versions, and does not alert maintainers of existing crates that their dependencies are vulnerable. A third-party tool to check for security vulnerabilities exists, but you’ve never heard of it and you have better things to do than run that on all of your crates every day anyway.

Originally I thought this would be a fun exercise for a few weekends, but the scope of the work quickly grew way beyond what I can hope to achieve alone. This is where you come in, though! Here’s a list of things you can try, in addition to the hard tooling tasks listed above:

  1. Fuzz all the things! It takes 15 minutes to set up per crate, there is no reason not to. Also, there is a trophy case.
  2. Fix bugs already discovered. For example: panic in lewton (easy), unbounded memory consumption in png (intermediate), lodepng memory leak (C-hard). You can also fuzz lewton afterwards to get more panics, just don’t forget to use ogg dependency from git. You can reuse my fuzz harnesses if you wish.
  3. Refactor unsafes in popular crates into safe code, ideally without sacrificing performance. For example, inflate crate has just one unsafe block remaining, png has two. There are many more crates like that out there.
  4. There are easy tasks on docs and tooling too: AFL.rs documentation is outdated and describes only version 0.3. Version 0.4 has added in-process fuzzing that’s ~10x faster, it needs to be mentioned. Also, AFL could use more Rusty integration with Cargo, closer to what cargo-fuzz does. Also, disabling checksums is a common pitfall that needs to be mentioned.

I’d love to keep fixing all the things, but at least in the coming month I will not able to dedicate any time to the project. I hope I’ve managed to at least lead by example.


And now, details on that vulnerability! If you haven’t found it yourself, here’s a hint: similar bugs in C libraries.

If you still haven’t found it, see the fix.

Spoilerrific discussion of the vulnerability below.

Vulnerable code from git history for reference

The function run_len_dist() does a fairly trivial thing: resizes a vector to fit a specified amount of data and copies data from element i to element i+dist until i+dist hits the end of the vector. For performance, contents of the vector are not initialized to zeroes when resizing, as it would have been done by vec.resize(); instead, vec.set_len() is used, creating a vector with a number of elements set to uninitialized memory at the end.

The function never checks that dist is not zero. Indeed, if you call it with dist set to 0, it will simply read uninitialized memory and write it right back, exposing memory contents in the output.

If this vulnerability were actually exploitable from the external API (which it isn’t, probably), inflate would have output contents of uninitialized memory in the decompressed output. inflate crate is used in png crate to decompress PNGs. So if png crate was used in a web browser (e.g. servo) to decode images, an attacker could pass a crafted PNG to the client, then read the decoded image using javascript. This lets the attacker read memory contents from the browser - cookies, passwords, you name it. This is not quite as bad as Heartbleed or Meltdown, but it's up there.

Sadly, regular fuzzing would not have discovered this vulnerability. If it were actually exploitable, at least one way to trigger it would involve setting several distinct bytes in the input to very specific values. And even the best current generation fuzzers cannot trigger any behavior that requires changing more than one byte simultaneously, except in rare cases or if you explicitly tell what consecutive byte strings it should try. And there is nothing in the code that would guide the fuzzers to these specific values.

Even if fuzzers did discover such an input by random chance, they would not have recognized it as a vulnerability, unless you do either of these things:

  • Fuzz your code under memory sanitizer (not to be confused with address sanitizer), which is impossible for any crate that links to C code and is compatible with only one fuzzer — AFL, and only in its slower stdin mode (possibly honggfuzz too in its slower binary-only instrumentation mode, but I haven’t checked).
  • Create a fuzz harness that decodes the same input twice and verifies that the output matched, and somehow ensure that the memory allocation was not reused. AFAIK Rust’s default jemalloc allocator can reuse allocated memory, so you're probably again limited to AFL in stdin mode.

This just goes to show that fuzzing unsafe code does not actually guarantee absence of bugs.

Safe Rust, however, does guarantee absence of memory errors that lead to arbitrary code execution exploits and other unspeakable horrors. So let’s use it.