Why zip rules the (compression) world

Pascal Riesinger
Slashkeys Engineering
5 min readApr 10, 2017

Let’s face it: Everyone knows what a .zip file is, even my granny (and she is 75). It is the compression format of choice for sharing with lots of services: Download something from Google Drive? It’ll be a zip archive. Use Owncloud or even Nextcloud? Download dem zips! Right click on a folder in Windows and choose “send via email” — You guessed right: The folder will be zip-compressed.

But why the hell does everyone use zip?

I needed to investigate! So I dug around and collected some of the better-known compression utilities for an apples-to-apples comparison. The list now includes:

  • Zip (I mean… Obviously)
  • Gzip
  • Bzip2
  • XZ
  • LZMA (actually XZ’s predecessor)
  • 7Zip

Next, I prepared three folders containing some different file types. The first folder, my small little archive consists of some random DOCXs, some PDFs, images and some binary files, so it is a rather good representation of something a random guy in an office would compress and send per mail. My second folder consists of over 300 MP3s, ~20 FLACs and one OGG. As MP3 is already compressed and all of those are binary formats, the folder is expected to not compress that well. The last, but not least folder is some of my school work, which are mostly DOCXs, some XLSXs, 2–3 ODTs and 2 MP4s.

Without further ado, let’s get the bit-crunching madness started!

The archive folder has an uncompressed size of 275MB (all sizes were read using du -h ) and here are the results of the first compression run:

Size in MB (lower is better, obviously)

So as you can see on the chart, XZ and 7zip clearly take the lead ripping zip apart! Also, the chart makes it obvious that XZ is the successor to LZMA, as it compresses slightly better (and is slightly faster while doing so). But if 7zip and XZ are so much better than zip reducing the file size from 135MB compressed with zip to 88MB / 87MB respectively, why are we still using zip? The simple answer is: It is fast! While zip took 13.7 seconds compressing the folder, XZ takes a whopping 75.7 seconds doing the same thing. That is a huge tradeoff for quickly sending multiple files to your colleague. (I will talk about 7zip’s speed later).

Now on to the music folder, which, while uncompressed has a size of 6844MB. I don’t expect much savings, as MP3 is a highly optimized file format. So let’s see:

Again, sizes are in MB

My thesis got confirmed: MP3s don’t compress well. Actually, I think that the savings originated from the few FLACs and the OGG file (I don’t know what those formats look like). But still, 7Zip took the lead saving about 378MB while zip saved 334MB. XZ lies exactly between 7Zip and zip with about 357MB of savings. However, as I took a look at the compression time, my eyes almost popped out: Zip took 4.35 Minutes to compress the files, but XZ took over 44 Minutes! Geez, XZ is slow! (Or at least I thought so).

Now on to the last test: My school documents. The folder has a size of 35 MB uncompressed.

As always, XZ and 7Zip compressed the best

Again, Zip, Gzip and BZip2 were the least efficient compression formats with 32MB each (so only saving 3MB). XZ, LZMA and 7Zip each saved 13MB! I know reading this is repetitive, but Zip was the fastest tool taking 1.1 seconds, while XZ took around 8.7 seconds.

Why is 7Zip so much faster than everything else?

As I told you, I want to say something about the compression speed now, as 7Zip was always faster than XZ or even BZip2 which surprised me. While monitoring my CPU usage, I found the answer to my question: The p7z tool is multi-threaded by default, while none of the others are! As I did not want to test everything again, I just ran XZ over my music folder using all 8 cores of my FX-8320 this time. I needed to first tar the directory, as XZ can only compress single files (like Gzip, Bzip2 and LZMA). So I ran

time tar cf - music/ | xz -zcT 8 > music.tar.xz

and there we go! We are down to 7.1 minutes, which is just one minute slower than 7zip with 6 minutes. However, the single-threaded zip still pulls ahead with 4.3 minutes

Summary time

The TL;DR version is: Keep using zip if you want fast but reasonably good compression, use 7Zip if you actually care about archive size.

The longer version is: If you do not really care all that much about the actual file size, but want to send the archive to someone, keep using zip, as it is widely available and very fast. It does not produce the most efficient archives, but whatever. However, if you want minimal archive size and do not care how long it takes to compress everything, you should definitively consider using 7Zip. It gains popularity rapidly, so every half-tech-savvy person will have it installed. You might also use XZ, but don’t forget to use all threads, as it gets painfully slow otherwise! I managed to squeeze even fewer bits out of XZ while compressing my music folder like so:

tar cf - music/ | xz -zcT8 -9 --extreme > music.tar.xz

This managed to clear the 7Zip archive by 12MB, however it took almost 12GB of RAM compressing the folder and took 11 minutes, which — in my opinion — are not worth your time and processing power.

Thanks for reading my post! If you enjoyed it, leave a 💚! Also, ping me on Twitter 🙂

--

--