I previously wrote about how we can use Rust’s “fearless concurrency”, resulting in a tool called ripunzip. (Here are some performance results).

How can we be sure that it behaves the same way as a non-parallel, regular, unzip? Even in unforeseen circumstances?

The answer: comparative fuzzing. Fuzzing is normally a technique used to find security bugs — here’s a recipe for how you can use it to look for bugs in Rust unsafe code — but you can also use it to check that two implementations behave the same way — comparative fuzzing.

It’s a good fit for the case where we’re trying to make a faster version of an existing tool, and fuzzing happens to be quite fun — especially in Rust where there’s the marvelous arbitrary crate.

Here’s the plan:

  • Describe all the inputs to ripunzip , including the zip file, using the arbitrary crate
  • Run ripunzip based on those inputs
  • Also unzip the same file using regular, single-threaded zip-rs
  • Check they both succeeded or both failed
  • Compare the unzipped directory. If it’s different, crash!

Then we simply set cargo fuzz to work. It should explore the space of all possible inputs to ripunzip and, if any of them result in a difference in behavior from zip-rs , it should tell us.

Here’s the fuzzer we ended up with. First, let’s look at the input data:

#[derive(arbitrary::Arbitrary, Eq, PartialEq, Hash, Debug, Clone)]
struct ZipMemberFilename(/* details omitted... described later */);

#[derive(Eq, PartialEq, Hash, Debug, Clone, Copy, arbitrary::Arbitrary)]
enum CompressionMethod {

#[derive(arbitrary::Arbitrary, Debug, Clone)]
struct Inputs {
// HashMap to ensure unique filenames in zip
zip_members: HashMap<ZipMemberFilename, Vec<u8>>,
compression_method: CompressionMethod,
single_threaded: bool,

The most important bit is theInputs struct. It includes a zip file, a compression method, and any parameters to be passed to ripunzip about how we want to unzip it.

It derives Arbitrary , which means code is generated to construct this structure from an opaque binary blob. Those binary blobs are generated by libfuzzer, and — crucially — they are amended and re-amended to gradually explore as much of the program as possible, based on coverage guidance.

We’ve got some pretty complex input there — HashMap , vectors, and custom enums. Arbitrary is smart enough to generate all of them.

Now, the fuzzer itself is essentially trivial — we literally just create the zip, unzip it twice, and compare the outputs.

fuzz_target!(|input: Inputs| {
let progress_reporter = ripunzip::NullProgressReporter;
let tempdir = tempfile::tempdir().unwrap();
let output_directory = tempdir.path().join("out_ripunzip");
let output_directory_unzip = tempdir.path().join("out_unzip");
let options = ripunzip::UnzipOptions {
single_threaded: input.single_threaded,
output_directory: Some(output_directory.clone()),
let zipfile = tempdir.path().join("");
let mut zip_data = Vec::new();
create_zip(&mut zip_data, &input.zip_members, input.compression_method);
let mut file = std::fs::File::create(&zipfile).unwrap();
let file = std::fs::File::open(&zipfile).unwrap();
let ripunzip_result: Result<(), anyhow::Error> = (|| {
let ripunzip = ripunzip::UnzipEngine::for_file(file, options, progress_reporter)?;
let unziprs_result = unzip_with_zip_rs(&zipfile, &output_directory_unzip);
match unziprs_result {
Err(err) => {
if ripunzip_result.is_ok() {
panic!("ripunzip succeeded; plain unzip gave {:?}", err)
Ok(_) => {
let ripunzip_paths = recursive_lsdir(&output_directory);
let unzip_paths = recursive_lsdir(&output_directory_unzip);
// We do not currently compare the actual zip file contents.
// It seems unlikely that this would be a failure mode where
// ripunzip would differ from zip-rs.
assert_eq!(ripunzip_paths, unzip_paths);

fn recursive_lsdir(dir: &Path) -> HashSet<std::path::PathBuf> {
.filter_map(|e| e.ok())
.map(|e| e.path().strip_prefix(dir).unwrap().to_path_buf())

// Create a zip file from zip_members into output,
// using the given compression mode. Details omitted.
fn create_zip(output: &mut Vec<u8>, zip_members: &HashMap<ZipMemberFilename, Vec<u8>>, compression_method: CompressionMethod) {
// ...

/// Errors that can occur with a regular zip-rs unzip, details omitted
enum ZipRsError {
// ...

/// Unzip the content with standard zip-rs.
fn unzip_with_zip_rs(zipfile_path: &Path, dest_path: &Path) -> Result<(), ZipRsError> {
// ...

One final detail — the member filenames of the zip file. Originally I used arbitrary strings for filenames, but it turns out that both zip-rs and ripunzip fail non-deterministically if you make a sufficiently wacky filename. I therefore simplified this to be a sequence of particular strings:

#[derive(arbitrary::Arbitrary, Debug, Clone, strum::Display, Hash, Eq, PartialEq)]
enum FilenameSegment {
#[strum(serialize = "b")]
#[strum(serialize = "31")]
#[strum(serialize = "_c")]
#[strum(serialize = "e.txt")]

#[derive(arbitrary::Arbitrary, Eq, PartialEq, Hash, Debug, Clone)]
struct ZipMemberFilename(Vec<FilenameSegment>);

impl Into<String> for &ZipMemberFilename {
fn into(self) -> String {

(note use of some lovely crates there, strum and itertools ).

What happens when we run this? We see this delightful output as the fuzzer ferrets around our codebase.

That cov figure it the total number of code blocks or edges explored. libfuzzer will mutate the input to try to explore more and more. Every so often, it finds a way to craft the input to hit a new function, and tells us of its success, like a proud little squirrel unearthing a nut.

After typing the above paragraph, it’s now explored 3795 code blocks.

Eventually it plateaus. It’s weirdly fascinating.

So, did it work? Did it find any problems?

Yes! Immediately. For example:

I’d only tested with larger zip files. As soon as I tried a tiny zip file, seeks went into negative territory, and we panicked instead of returning an error code. Oops. Note the ridiculously good output, with not just a stack trace, but the precise input that generated the problem.

It also spotted:

  • sometimes multiple threads raced to create parent directories of the members they were unzipping.
  • we took different decisions regarding filenames ending in / — I aligned to zip-rs behavior here

These were literally cases I hadn’t thought of, so weren’t in my test suite. It’s rather cool to use a technology which can uncover your Unknown Assumptions. Comparative fuzzing FTW!

PS it’s now explored 4359 code blocks



