Does parallel unzipping work?

Adrian Taylor
3 min readDec 23, 2022

--

In a previous post, I had a crack at writing an unzip utility that uses unzips files in parallel, on the basis that Rust makes this safe to attempt.

Here are some performance results.

The use case I care about is unzipping the Chromium ASAN builds obtainable from here. For these tests I’ve picked one particular build, which happens to be 3,845,117,901 bytes.

ripunzip now supports two modes: ripunzip file <filename> , and ripunzip uri <URI> . The former just unzips a file. The latter uses multiple HTTP range requests to perform the download and unzip in parallel. (Range requests are necessary because of the structure of a zip file — it needs to read the directory information spread all over the file before it can start to actually decompress the contents).

Chart of speed results: ripunzip curl unzip Fast Linux VM: ripunzip file 9 Fast Linux VM: curl + unzip 110 94 Fast Linux VM: ripunzip URI 119 Slow Linux VM: ripunzip file 173 Slow Linux VM: curl + unzip 88 147 Slow Linux VM: ripunzip URI 115 Mac: ripunzip file 39 Mac: curl + unzip 48 76 Mac: ripunzip URI 51 Windows VM: ripunzip file 52 Windows VM: Chrome fetch + 7z 41 165 Windows VM: ripunzip URI 92
Chart of speed results for ripunzip vs traditional download & unzip tools

(using ripunzip SHA ca71fa12e7510b9d48b55ccd6a9dfe051291ca42. These results have not been repeated to be statistically valid because I don’t want to put too much load on the source HTTP server. There are some official cargo criterion benchmarks in the project, but they don’t adequately represent real world network conditions).

There are some interesting results here!

First the good news about unzipping files: Unzipping this zip file on a fast Linux VM takes 9 seconds, as opposed to unzip which takes 94 seconds. On a Windows VM, it’s 52 seconds as opposed to 165 seconds for 7z.

However, this improvement isn’t universal. On a slow Linux VM, ripunzip is actually slower. I can’t fully explain this — it seems to be something to do with the storage backend of the VM system struggling to cope with lots of parallel requests — perhaps it’s actual physical hard disk heads seeking somewhere in the cloud? (Wow. I didn’t anticipate having to make this tool work efficiently for chunks of old-fashioned metal.) I’m not sure this is the explanation, but it does seem to be something to do with slow write speeds.

On unzipping directly from a URI, we generally see that ripunzip seems to be bandwidth-limited. In a good way! We complete the unzip in just a few more seconds than it would have taken to download the zip file in the first place using curl, so effectively we get the unzipping “for free”. This doesn’t apply on Windows, possibly because I used Chrome to download the zip file ( curl wasn’t available). But, the unzip speed is so much quicker on this Windows machine that it’s still advantageous.

One particularly interesting point is that ripunzip uri is quicker than ripunzip file on the slow Linux VM… I guess seeking disk heads between the zip file and the unzipped files really is super slow, whereas reading from a remote HTTP server is actually quicker (!)

What does this mean for Chromies reproducing security bugs? Well, I need to package the tool and make it available on all of our testing environments, but overall it means a minute or two saved per Chromium build we have to download… (so long as you’re using an SSD?) That actually matters when we have to download several builds, several times a day. It will be interesting to see if these results scale up to the 36GB UBSAN builds, and whether this tool can make some of our automated systems quicker too. (I think I’ll find out quite quickly whether those systems are backed by SSDs or hard disks…)

You can try all this with cargo install ripunzip .

--

--

Adrian Taylor

Ade works on Chrome at Google, and likes mountain biking, climbing, snowboarding, and usually his kids. All opinions are my own.