Streams in Node

using minimal pipeable streams

Yoshua Wuyts
3 min readMar 21, 2016

Streams are an asynchronous abstraction that allows dealing with data in small chunks, pushing bottlenecks into the IO layer. This usually leads to less memory cost and increased performance, which is a very good thing.

In streams, data flows from a source, through a bunch of through streams, into a sink:

┌────────┐   ┌─────────┐   ┌────────┐
│ Source │──▶│ Through │──▶│ Sink │
└────────┘ └─────────┘ └────────┘

Node has shipped streams as part of its standard library since its early days. However with each release, new features APIs and concepts were added, making the current implementation very unwieldy. In my years as a Node developer I’ve only met a handful of developers that felt comfortable using the current version of Node streams. In an ideal world streams would be used as much as in Node as pipes are used in shell.

Enter pull stream

pull-stream is an alternative model for streams created by Dominic Tarr. Like Node streams, it has a concept of back pressure. This means that instead of a source pushing out data as fast as it can, the consumer stream pulls data once it’s ready to handle more. This leads to a program never holding more data in memory than it strictly needs.

The Node streams source is well over 1200 lines, without even accounting for dependencies. The pull-stream source is just 28 lines, which is a whopping 0.4kb minified:

Don’t be fooled by the simple exterior though, pull-stream provides the same functionality as Node streams do. With fewer lines of code there’ll be less bugs, and room to optimize every last bit of code.

Pull stream types

In pull streams there are 3 types of streams. Source, through and sink. In order to let data flow, a source and sink must be connected. Through streams are combinations of sources and sinks, making every connection in the pipeline a source and a sink that talk to each other. Conceptually it looks like this:

┌──────┐   ┌──────┐   ┌──────┐   ┌──────┐
│Source│──▶│ Sink │ ┌▶│ Sink │ ┌▶│ Sink │
└──────┘ ├──────┤ │ ├──────┤ │ └──────┘
│Source│─┘ │Source│─┘
└──────┘ └──────┘
Through Through

If a source and a through stream are connected they will not start emitting data until a sink is attached at the end. Likewise, if a through and sink are connected, they will not start flowing data until a source is attached at the start. This allows composition of arbitrary streams into pipelines similar to what pumpify provides for Node streams.

Composition

pull-streams use the pull() function to combine sinks and sources. Because sinks connect to sources, any number of streams can be connected. It’s functional composition all the way.

Duplex streams are objects that have a .source and .sink properties on them. The following are equivalent:

pull(a.source, b.sink)
pull(b.source, a.sink)
b.sink(a.source)
a.sink(b.source)
pull(a, b, a)

Helper functions

pull-stream ships with helper functions such as asyncMap that make common interactions trivial. See the docs for available sources, sinks and throughs.

Let’s create a basic map-reduce pipeline using asyncMap where we asynchronously fs.stat an array of files, and gather the results in an array:

Because under the hood we’re just composing functions, the overhead of doing this is reduced to a bare minimum.

Error handling

In Node streams errors don’t propagate through .pipe() chains. It’s therefor common practice to either use helper libraries or attach a .on(‘error’) listener to every stream. Getting errors wrong is not a great feeling.

In pull-streams errors are passed into the callback, which grinds the whole stream pipeline to a halt. It’s again the familiar api of cb(err) for an error and cb(null, value) for success.

Here’s a source stream that returns a single fs.stat value:

Wrapping it up

And that’s it. I could keep whipping out pull-stream examples, but I think we’ve made our point: streams are a cool idea; pull-stream is a neat implementation. If you’re keen to learn more, take a look at the links below. I hope this was useful; give pull-stream a try let me know how you go! -Yosh

See Also

Originally published at github.com on March 21, 2016.

--

--