Writing memory efficient software applications in Node.js

A building designed for avoiding air streams (https://pixelz.cc)

A software application runs in the computer’s primary memory which we call Random Access Memory(RAM). JavaScript especially Node.js(Server-side js) allows us to write small to mega-sized software projects for end users. Dealing with a program’s memory is always a tricky one because a lousy implementation can block all other applications running on the given server or system. C and C++ programmers do take care of memory management because of devilish memory leaks those lurk in every corner of the code. However, js developers? Are you bothering it?

Since js developers usually do web server programming on a dedicated server with high capacity, they may not feel the lag in multitasking. Even in the case of web server development, we do run multiple applications like database server(MySQL), cache server(Redis) and much other software required by our software. We need to be aware of that they too consume the available primary memory. If we write applications casually, it can degrade the performance of other processes or completely deny them the memory allocation. In this article, we see Node JS constructs like streams, buffers, and piping by solving a problem and understand how they allow writing memory-efficient applications

We use Node.js v8.12.0 to run the programs. All the code samples we are going to present are available here.

Problem: Huge file copy

If anyone is asked to write a file copy program in Node.js, they quickly jump and create this one.

This program basically creates handles for reading a file and write a file with the given file names and try to write data into write handle after reading. It works on small files.

Let us say our application copies a huge file (> 4GB) as part of a backup process. I have an Ultra HD 4K movie file of 7.4 GB size. If I try to run above program to copy this big file from my current directory to Documents.

$ node basic_copy.js cartoonMovie.mkv ~/Documents/bigMovie.mkv

I get this nice buffer error on Ubuntu(Linux).

/home/shobarani/Workspace/basic_copy.js:7
if (err) throw err;
^
RangeError: File size is greater than possible Buffer: 0x7fffffff bytes
at FSReqWrap.readFileAfterStat [as oncomplete] (fs.js:453:11)

As you see the read operation fails because Node JS only allows you to read 2GB data into its buffer and no more. How to overcome that. When you are doing I/O intensive operations (Copy, Process, Zip), it is better to consider system memory.

Streams and Buffers in Node JS

To overcome the above problem, we need a mechanism of breaking the large data into multiple chunks, a data structure to hold those chunks. A buffer is a data structure which stores binary data. Next, we need a way to read/write chunks systematically. Streams provide that functionality.

Buffers

We can easily create a buffer by initializing the Buffer object.

let buffer = new Buffer(10); # 10 is size of buffer
console.log(buffer); # prints <Buffer 00 00 00 00 00 00 00 00 00 00>

In newer versions of Node.js (>8), you can also do this.

let buffer = new Buffer.alloc(10);
console.log(buffer); # prints <Buffer 00 00 00 00 00 00 00 00 00 00>

If we have some data already like arrays or any collections, we can create a buffer out of it using this.

let name = 'Node JS DEV';
let buffer = Buffer.from(name);
console.log(buffer) # prints <Buffer 4e 6f 64 65 20 4a 53 20 44 45 5>

Buffers have few important methods like buffer.toString() and buffer.toJSON() to look into the data stored in them.

We don’t create raw buffers in our journey to optimize code. Node JS and V8 Engine does that by creating internal buffers(queues) while working with streams or network sockets.

Streams

In simple terms, a stream is like a sci-fi portal on a Node JS object. In computer networking, ingress is an incoming action and egress is outgoing. We use these terms hereafter.

There are four types of streams available:

  • Readable streams (you can read data from it)
  • Writable streams (you can feed data into it)
  • Duplex streams (It is open to both read and write)
  • Transform streams (A custom duplex stream for processing data(compressing, validity check) that is ingress/egress for it)

This single line can tell precisely why one should use streams.

A vital goal of the stream API, particularly the stream.pipe() method, is to limit the buffering of data to acceptable levels such that sources and destinations of differing speeds don’t choke the available memory.

You need some way to do the operation without overwhelming the system. That is what we talked in the initial sentences of this article.

Courtesy: Node JS Docs

In the above diagram, we have two types of streams. Readable and Writable. The .pipe() method is a very basic primitive for attaching a readable stream to a writable stream. If you don’t understand the above diagram, it is fine. After seeing our examples, you can come back here, and everything makes sense to you. Piping is a compelling mechanism and below we illustrate it with two examples.

Solution 1 (Naive file copy with streams)

Let us devise a solution to overcome the huge file copy problem that we discussed earlier. To make it possible we can create two streams and implement this procedure.

  1. Listen for data chunk on Readable Stream
  2. Write that chunk on Writable stream
  3. Track the copy operation progress

Let us name the program as streams_copy_basic.js

Streams without piping

In this program, we are asking the user to input two files (source and destination) and created two streams to copy the chunks from readable source to writable destination. We declared few more variables to keep track of progress and printed it to the standard output(console here). We subscribed to few events like:

data’: invokes when a data chunk is read

end’: invokes when reading chunks from a readable stream are finished

‘error’: invokes if there are any problems in the reading process

Run this program, and we can successfully copy a big file (7.4 GB in my case)

$ time node streams_copy_basic.js cartoonMovie.mkv ~/Documents/4kdemo.mkv

However, there is a problem. Observe the memory used by the Node.js process in the activity/process monitor on your machine.

See the memory used by Node process alone at 88% copy

4.6GB? The RAM usage by our file copy program, in this case, is insane and can potentially block other applications

Why is it happening?

If you observe the read and write rates of the disk on the above picture, there is something that catches your eye.

Disk Read: 53.4 MiB/s

Disk Wite: 14.8 MiB/s

It means producer is producing at faster rate and consumer is unable to catch up the pace. The computer to save the data chunks read, stores the excess data into machine’s RAM. That is the reason for a spike in RAM.

This program ran for 3 minutes 16 seconds on my machine..

17.16s user 25.06s system 21% cpu 3:16.61 total

Solution 2 (File copy with streams and automatic back pressure)

To overcome the above problem, we can modify our program to adjust the reading and writing speeds of the disk automatically. This mechanism is back pressure. We don’t need to do much. Just pipe the readable stream into a writable stream. Node.js takes care of back pressuring the system.

Let us name the program as streams_copy_efficient.js

Here, we replaced the chunk writing operation with a single statement.

readabale.pipe(writeable); // Auto pilot ON!

The pipe is the reason for all magic that is going to happen. It controls the read and write speeds of disk thus will not choke the memory(RAM).

Now run the program:

$ time node streams_copy_efficient.js cartoonMovie.mkv ~/Documents/4kdemo.mkv

We are copying the same big file (7.4 GB) this time too. Let us see how the memory trends are.

pipe is a magic wand in Node

Wow! Now Node process is only consuming 61.9MiB RAM. If you observe the read and write rates to the disk:

Disk Read: 35.5 MiB/s

Disk Write: 35.5 MiB/s

At any given time, read and write speeds are same because of back pressuring. The bonus is this optimized program ran 13 seconds faster than the previous program.

12.13s user 28.50s system 22% cpu 3:03.35 total
A decrease of 98.68% load on memory that too with less execution time, thanks to Node JS streams and piping. That is why we said the pipe is a powerful construct.

61.9 MiB is the buffer size that is created by the read stream. We can also allocate a custom size to that buffer chunk with the read method on readable stream.

const readabale = fs.createReadStream(fileName);
readable.read(no_of_bytes_size);

Instead of copying files locally, this technique can be used for optimizing many things that deal with I/O:

  • A data stream that is coming from Kafka and going into a Database.
  • A data stream coming from a file system, compressing on the fly and writing to a disk.
  • Many more…

Source Code (Git)

Want to test this whole thing on your machine? You can find all the above code samples in my git repository.

Conclusion:

My primary motivation to write this article is to show how quickly we can write bad programs with poor performance even though NodeJS provides us great API. If we give more attention to the inbuilt tools, we can change the way our software runs.

You can find more about back-pressure here.

Hope you enjoyed this article. If you have any queries, please do comment either here or on my twitter wall. https://twitter.com/@Narenarya3

Have a nice day :)

References: