Zipping and unzipping files with NodeJS

Harriet Ryder
4 min readJul 10, 2018

--

A sack being unzipped to reveal… potatoes

Imagine you have a directory of zipped files and you want to unzip them all. This can be achieved manually by clicking on all of them to unzip them, but it can also be achieved with a simple NodeJS script. The other day I had do to this with dozens of files downloaded from AWS S3 and surprisingly couldn’t find a clear example online of exactly what I wanted, so I decided to write this article.

Setup

If you want to create some zipped files to practice on, follow the below instructions to create a new Node project and to create the practice files:

$ mkdir zipping-practice

$ cd zipping-practice

$ touch index.js

$ mkdir data

$ echo 'whatever text you want' > data/file1.txt (this will be one of your practice files… make however many you want)

$ gzip -r data/*.txt (this zips all the files ending in .txt)

You will now see that your data is full of files ending in .gz which is a compressed format. This format is commonly used when compressing data to be sent via HTTP. You can read more about it here but it’s quite boring 💤

The code

Open up index.js in your editor.

We’re going to use a module that comes with Node called Zlib which has a bunch of methods for compressing and uncompressing things. We’ll also use the filesystem module to allow us to read and write data from the filesystem (because we need to read the zipped files and write new, unzipped files).

First of all, let’s just unzip one file before working out how to do it for ALL the files:

We bring in the two modules we’ll need and then we read our first file, using the readFileSync method which is more straightforward to use than the non-blocking, asynchronous readFile method.

If you log fileContents now you will see something like this:

ReadStream {
_readableState:
ReadableState {
objectMode: false,
highWaterMark: 65536,
buffer: BufferList { head: null, tail: null, length: 0 },
length: 0,
pipes: null,
pipesCount: 0,
flowing: null,
ended: false,
...etc

That doesn’t look like the contents of your file though! What is is? Is that what zipped data looks like?

Nope, it’s a “Readable stream”, which an object (or interface) allowing you to read a stream of binary data. What does that mean? It means that this object will give you chunks of the data (i.e. the contents of the file) bit by bit, so you can process the file bit by bit, and not have to hold the entire file in memory. This is great for big files, but unless you piped loads of text into the file in the steps above, we aren’t going to need our file delivered to us in chunks of binary data.

Too bad though, because createReadStream gives us it in chunks (well, one chunk) and there’s nothing we can do about it. 😖 And trust me, there’s not really another way to do this because as we’ll see in a minute, our unzipping method requires us to use a stream.

BTW this is a pretty great article on streams if you want to know more 🙌

Next up we create yet another stream. Two in fact. A writeStream (which will allow us to pipe the unzipped data piece by piece into a file, and a gunzip stream which will actually do the unzipping for us once we give it a stream of data.

So we pipe our file contents like so:

original file → unzip stream →new file

If you open file1.txt you should see it contains the same text you put in it earlier.

All the unzipping for all the files

We can do the same as we did above, but for each file in our ./data directory. NB it might be an idea to write your unzipped files to a fresh directory to keep them separate.

Note how we slice off the final .gz of the filename when we create the name of the new file. file1.txt.gz becomes file1.txt

This is fine but if you want to work programatically with your unzipped files afterwards, you need to know when the process of unzipping has finished. Since writing to the filesystem with our writeStream is asynchronous we’ll need to listen in for an event that tells us when it’s finished, and we’ll need to make sure we also have a way of knowing when all the files have been unzipped.

By mapping over the filenames and creating a promise for each one, we can safely know when all of our files have been unzipped. We resolve each promise when we receive the ‘finish’ event from the writeStream, telling is it’s finished writing to the new file.

Then you can continue to do whatever you want in the next .then block 🙂

Zipping it all back up again

Okay, you changed your mind, you want to zip everything back up again.

Luckily, you only need to change a few characters around!

So there you have it — zipping and unzipping with NodeJS.

Thanks for reading! Hope you learned something and don’t forget to follow me for regular programming posts 👋

x

--

--

Harriet Ryder

Software engineer. Enthusiastic about the life-improving merits of yoga, good beer and JavaScript. Once I was a librarian.