NodeJs Stream Interface Overview

David E Lares S
The Startup
Published in
4 min readFeb 3, 2021

The main foundation of software is based on data handling. NodeJs was designed in my opinion to build any type of software that has extensive work with data streams and buffers, of course, this is not a new concept in software development, has been around since always in almost every programming language around, but NodeJs presents amazingly fluent APIs for handling streams.

The NodeJs hype has changed the way that people see software, giving direct and detailed control on behavior and portability.

The main goal of this post is to review Writable/Readable/Duplex Streams, the use of the .pipe() methods, and what is the Backpressure mechanism

The most common stream interface analogy is “water transportation”, how do you transport water from a source to a destination without losing data? The most logical approach is the use of water. In this example, the water is our stream interface, which will handle data transportation efficiently from A to B.

Buffer vs Stream strategy

Both terms are tightly coupled, but the main difference relies on how things are sent. The buffering approach reads the source and sends the file completely vs the stream version which creates a reading stream, reads the content bit-by-bit, and sends it to the client once is received.

Here’s a snippet for the buffering approach.

Here’s a snippet for the streaming approach.

The syntax of both files won’t change too much, but, in terms of performance is quite notable how the streaming approach is efficient in terms of memory usage.

With that difference explained, let’s start with the many streams available natively with NodeJs.

The Readable Stream

This element is in charge of reading data from a stream, it can be reading a file from a server, or streaming an online video, but I prefer to call it data.

NodeJs as their own createReadStream method from the fs module to do so.

There are two types of reading modes here, the non-flow which asks for data manually or requires any kind of user interaction or the flow-mode that sends something without asking to.

The readStream API also has methods to detect/listen when the readable stream is passing data, having an error, or even when has ended.

Check the example below.

The previous code is a flow-mode stream, where it gets started immediately after the script is launched, without any user interruption or data control.

To gain control and interaction during the reading process, you can implement pause() control or ask for content with the process.stdin mechanism letting you manually start the streaming process.

This is self-explanatory, but pay attention to the condition set inside the process.stdin.on to the ‘data’ event, it says that will look and evaluate for the string comparison to resume the readStream process. The execution is paused immediately after the script starts.

This is a simple way to perform streaming in a controlled way.

Readable streams can be used in HTTP servers, to unzip data, and TCP sockets, and in almost every npm package that implements streams

Good for now. Let’s move to the Writable Streams

Writable Streams

Writable Streams represent a destination for incoming data, this is exactly the opposite of readable streams. WS (in short), captures data and does something to it, they are everywhere!!!!

For simplicity, the following script will implement a createReadStream to read a .mp4 file and then use a createWriteStream to generate an exact copy of that .mp4 file read.

We are working with two streams simultaneously, one for reading and the other for writing a new-coming file. But sometimes the reading process can be faster than writing, and this can crash the result of the process. And that’s how the backpressure concept comes along.

The backpressure

This one is a control mechanism for chunk digesting util is empty and ready for more, it uses the highWatermark attribute to handle how much information can be processed effectively, in other words, this property on the stream instance is a number that denotes the size of the buffer in bytes.

The following is the same writable example but with the backpressure technique applied.

This is a controlled way to handle digesting capabilities for the writable stream. The drain event is executed with the writable stream is ready to take more chunks of data to process.

Pipe pipe()

A simple alternative for backpressure is the pipe() object implementation, this one handles auto-backpressure.

As a developer you will need to handle errors on your own, if you are using Unix pipes you will need to use the || characters or read/write using the process.stdin or process.stdout

Check this example and find the pipe implementation for the error handling

There are two more types of streams available.

The first one is the Duplex Stream, which implements read/write streams in a single component, however, they do not transform data, so can be strategically placed between scenarios to achieve read or write mechanisms.

Here’s an example

Here, the PassThrough mechanism will be between the read and the write stream and log to the console the number of bytes of the current chunk.

The last one is the Transform Stream, which is a derived duplex version that can transform the data before it goes to the Writable side of the stream.

Here’s an example:

The ReplaceText class uses a native Transform class from the stream module, it contains the whole logic for transforming whatever string is sent via process.stdin

The instance expects a letter ‘X’ in this case, which is handled by the constructor , the _transform method grabs the chunk and makes a regex transformation, the _flush will be executed whenever the transformed chunk is processed.

Now, these are real superpowers, if you combine this with async tasks, you will have great efficiency in terms of data processing and performance.

Happy streaming :)

--

--