How Can You Increase I/O Performance With Streams?

Published in

Autodesk TLV

4 min readJun 26, 2019

As backend developers, we face read/write data challenges all the time. More data means more challenges.

In this blogpost I will introduce a method that I found as the most optimal way to read/write big amounts of data, and this method will help you increase your apps’ I/O performance.

I will explain the stream concept in general, and how you should use it in Node.js in particular.

The Offline Challenge From The Backend Side — A True Story

Any app that respects itself, must provide offline services to its users. Especially if its users might be using the app in a location without wifi, such as construction engineers in the construction area.

When the user is offline, he cannot get his relevant data from the server. Therefore, that user has to have all of his data on his device (e.g. cell phone, tablet, etc.).

The app developers challenge is to provide that entire data, in advance, in the most efficient performance way.

The most trivial solution for that challenge is to provide a simple GET collection endpoint. The client will call that endpoint every now and then, in order to have all of the data on the user device. That endpoint will return all of the data with pagination. The more information the user will have, the more requests he will send in order to get all the pages of data.

The cons of that solution are that the client will send requests as the number of pages, and it may drain the user battery or his internet package, and can overload our servers.

That solution may fit to small services with a small amount of data. But we have to take into account that the users’ data enlarges every day. Hence, if you would like to preserve resiliency and scalability, you should not use it.

So, you might be asking, what should we do then?

There are a lot of possible solutions to that challenge, and I’m going to share with you one of them using streams.

The challenge was sending a big amount of data to the clients without using a simple CRUD (Get collection with a pagination).

Each time the user logs in, the client will send a POST request, that request will trigger in the backend an async operation, that will retrieve the user data from the database and will create compressed files from it (with Brotli — a data format specification for data streams compressed). Each compressed file is a chunk of data, that will be sent on a request (request is a stream) to a file server.

Now the client can poll the server with GET requests. The polling response will be the status of the upload files (IN_PROCESS/COMPLETED), and the identifiers of the files that already uploaded.

The client doesn’t need to wait until the server will finish the uploading, it can start downloading the files with the identifiers that it got from fileserver.

And now we are not wasting the client and the server resources.

Now, let’s understand the stream magic!

A Brief History

Streams were introduced in Unix, decades ago.

I believe you all use that kind of commands in that terminal:

history | grep something

And that without even knowing that you are actually using streams. Actually, The pipe command takes the output (STDOUT) of each process and transfer it as an input (STDIN) to the next process. The STDOUT and the STDIN are both streams, and the pipe connects those streams.

Files I/O Chalenges

If you want to read and write a file you have a few ways of doing it in node.js.

We can do it synchronously:

const data = fs.readFileSync('file');
fs.writeFileSync('file-copy', data);

But if you don’t want to wait for those operations to be finished, in order to move on with the script, you will choose the asynchronous way:

fs.readFile('file', (err, data) => {
    fs.writeFile('file-copy', data, () => {
        console.log('file saved');
    }
}

Allegedly, everything works now and performance as well.

But have you ever asked yourself where the file is been saved until it’s written? In both cases, synchronously and asynchronously the file is saved in your RAM. Before your RAM limitations, you have node memory limit, that is around ~1.4 GB. That means, that if you’ll try reading a very big file, you might decrease your system performance, or even disrupt your program and crash it.

Streams To The Rescue

Luckily, node has a way to read and write that file in chunks, and not as a whole unit.

fs.createReadStream('file')
    .pipe(fs.createWriteStream('file-copy'));

OK, so we found a way, you can copy-paste it, now we may finish this post and continue with your life. But if you won’t understand what happened behind the scenes, you might miss some important things.

TL;DR

Streams are a very strong tool of transferring data if you know how to use them and understand how they work.

You will find all you should know about streams in Node.js in my blogpost:

https://medium.com/autodesk-tlv/streams-in-depth-in-node-js-c8cc7f1eb0d6