A Brief History of Node Streams pt.2

Published in

Node.js Collection

13 min readApr 7, 2017

Introduction

In part one, we did a primer of the Streams class in Node.js. This article may be read completely as a standalone. However, if you’re unfamiliar with Node Streams, it would be immensely helpful to read part one to improve your understanding of this piece and streams in general :)

At the time of this article’s publication, there have been a total of 4 distinct design implementations of Node.js Streams. Each iteration introduced new features to improve efficiency and the API interface for the developers using Node.js.

Timeline of stream implementations against relevancy of Node.js on Google Trends

To begin to understand the history of streams, it’s important to recall the UNIX philosophy Node.js has inherited, which emphasizes on reusability and singular functionality.

While investigating each iteration, let’s frame the evolutionary process from the lens of this core philosophy:

Each version strives to become easier for developers to use, with less overhead or maintenance.
Each major release should be as close to backwards-compatible as possible.
No fuss, no muss: a “it just works” approach.

Streams0

Dist: https://nodejs.org/dist/v0.1.100/
Doc: https://nodejs.org/docs/v0.1.100/api.html
Git: https://github.com/nodejs/node-v0.x-archive/tree/v0.2

Node.js was officially released in 2009. Since then, it has continuously improved itself from the feedback and contributions of the community. Like any growing repository, there have been radical changes as a result of a thoughtful, democratic, decision-making process.

Streams0 is the very first implementation of streams, and while it is highly unlikely you will encounter a package in the wild that still uses this version of Node.js, it is nonetheless a natural starting point to unfold the evolution of streams.

In the earliest versions of Node.js, Stream was described with the same definition as the current documentation:

A stream is an abstract interface implemented by various objects in Node. For example a request to an HTTP server is a stream, as is stdout. Streams are readable, writable, or both. All streams are instances of EventEmitter.

In this legacy of Node.js, “streams” are still found in the common domains (http, fs, tty, etc.) they are now, but they were built independently and inherited from the prototypical EventEmitter directly.

// Streams0: fs.module
sys.inherits(ReadStream, events.EventEmitter);// Streams3: fs.module
util.inherits(ReadStream, Readable);

Stream as a class had yet to be abstracted. It may be obvious to us now with the vantage point of retrospect to discern the utility of a class, which has its own standard functions and automation, and how they elevate the development of Node.js modules.

Yet, it was only through experimentation from the community of developers, the arising frustrations, and inconsistent products that were proliferating the economy of Node packages which led to mismatching modules and libraries, that really inspired the enveloping model we now know as Stream.

Nonetheless, there were still some upsides to Streams0, and some relics that exist in the current codebase.

One question to ask is: if there was no common base stream, did streams just exist as an abstract concept for a Node developer to implement, as one does with Queues or Linked Lists in their computer science courses?

So, the short answer is yes and no. While streams in the internal codebase were built within it’s respective domain and varied from each to each, they followed the a general design principle of having a read/write function and were connected by the .pump() tool found in util.js, which attached a system of event listeners and executed the same method of data-transfer.

In the .pump() function used by the Streams0 implementations, we find the familiar shape of streams .pipe(). But instead of the UNIX-like pipe used to feed data from one source to another, the now deprecated .pump() function utilized a different modality of transferring data.

// pump
pump(ReadableSource, WritableDestination)// pipe
ReadableSource.pipe(WritableDestination)

Pump vs Pipe:

The flow of data in util.pump() is handled entirely by the function. It behaves as a parent or authoritative structure that distributes data from one stream to another, whereas the Stream.pipe() function is extended by the source stream and data cascades downstream.

The introduction of .pipe() also paved the way for the common look of stream implementations as chained functions:

FoodStream.pipe(get)
          .pipe(into)
          .pipe(my)
          .pipe(belly);

While pipelines are great and elegant, as with all things in our dear universe, there are, nonetheless, still a few flaws.

There exists an open issue with the way a sequence of pumps or pipes handles early exits in a pipeline.

In stream pipelines, meta-information does not propagate. It is important to note, as it means destination streams do not have access to its source stream’s information, namely, errors or failures.

A popular pump library exists outside of Node.js core, and is intended for constructing complex pipelines for its graceful termination in case of an error.

As you can see, there are many items Streams0 consistent with current streams. Though there is less code defining the legacy streams, don’t be fooled by its austere appearance. It was still a very useful implementation and adhered to the UNIX philosophy by way of automation — providing a data-transfer method that attached event handlers and enabled a backpressure system, all with a light and elegant interface.

The greatest measured difference, as we’ll soon see, is that with each upgrade, streams become much more optimized, able to handle more complex edge-cases, and helps to reduce ambiguity for building custom streams, all the while addressing unique issues found by the community of developers.

Streams 1 (Classic Streams)

Dist: https://nodejs.org/dist/v0.4.0/
Doc: https://nodejs.org/docs/v0.4.0/

The release of Node v0.4.0 in 2011 introduced a major upgrade to streams. The changes in this release were so auspicious and necessary this version is now referred to as Classic Streams, partly due to their place in the first line of Stream iterations, but also for the distinguishable way of transferring data known as the push method, which we will discuss later in the article.

In this version, Stream became its own object class. This change solved the lack of a standard model found in Streams0, and made it so streams were easily reusable across modules. This upgrade also introduced .pipe().

Classic Streams was an important upgrade and while, a lot changed, the upgrade from Streams0 was virtually seamless:

The unified base class and methods created a much more consistent API, automatic data flow, and improved the way Stream was implemented and connected to other modules, even if their data streams were unrelated.

Streams1, though, was far from perfect and came with its own flaws. These are a overview of the most common issues:

.pause() did not do what the function name suggested.
The source stream would begin open and emit data regardless if the consumer was prepared, or if there was data (this malfunction is where the moniker spew streams originated from)
There was no way to tell the data buffer to stop and delegate half a source buffer to a later part of the program.

From the moment a stream opened, there were problems as a result of these ambiguities. Since a stream instantiated and ignored whether or not a buffer was prepared for incoming data, it could result in missing chunks. The workaround was to start off your stream in paused mode and then wait a short amount of time to ensure the integrity of the data:

var readable = fs.createReadableStream('readme.txt');readable.pause();  /* On initialization */
setTimeout(readable.resume().bind(this), 500);

The .pause() function, also, was misleading. It was, in the documentation’s words, advisory-only.

In short, the pause function was really only meant to be used internally. Thus, if a developer’s implementation attempted to use it, it was not guaranteed that the source would immediately pause, and it was highly recommended to ensure your data events were prepared to catch emitted chunks after pause was called.

Additionally, even though the data would eventually pause, it did not mean that backpressure was enforced. It only stopped the pulling of data and emission of the data event. It kept the source stream open, so any implementations that used pause incorrectly exhausted a computer’s memory usage.

You can probably understand why all these ambiguities were really confusing and difficult for developers.

Streams2

Dist: https://nodejs.org/dist/v0.10.0/
Doc: https://nodejs.org/docs/v0.10.0/

This original post, written in 2012 by isaacs (Isaac Z. Schlueter), outlines both the changes and thought process that led up to the release of Streams2. I highly recommend reading it for an authoritarian, and fuller understanding between the differences of Streams1 and Streams2.

There were significant changes in the newer version — it improved backpressure and the entire design became more modularized and emphasized different mechanics for the streams pattern.

As you know, all streams must have a source and a consumer, in that regard, it is a wonderful way to connect packages. But there were big problems that could arise if your package or someone else’s were not built using the same pattern.

What happened if their package did not implement data control in the same manner as yours? Or if they did not respect the automated backpressure system in Node streams, resulting in an impact of performance? The answer is: nothing, really.

A structural solution was devised by Node.js.

Streams2 introduced abstracted subclasses that inherited from the Stream base class to fix these inconsistencies. As opposed to a single Stream to handle different types of functioning, it was broken up so that there were now 4 distinct variants, modularized and made to be efficient in its single responsibility.

Each variant divined its own ruleset for what it could and could not do. The design itself was embedded with best practice methodology, coupled with the suggested patterns on construction in the improved documentation.

What building streams looks like in earlier versions of Node.js: “Just a connect a thing to another thing using these!”

In Streams2, the API interface gave clearer borders and pathways for constructing a stream, with the help of the new subclasses:

Readable and Writable streams were abstracted into separate subclasses, so more construction of streams shipped with the code. And the Transform and Duplex streams provided a bidirectional pathway for data to be manipulated in between the source of the data and the final consumer.

Streams2 interface provided clearer roadmap and rules for developers to follow.

Now, the main detail most developers needed to pay attention to was when writing stream modules using the methods Writeable.prototype_write() , Readable.prototype._read(), and Transform.prototype._transform().

From Node v0.10.0 documentation:

This method is prefixed with an underscore because it is internal to the class that defines it, and should not be called directly by user programs. However, you are expected to override this method in your own extension classes.

So long as they respected the return values of these functions, it would guide the backpressure system and data-flow in your stream without error, and capitalize on the inherent optimization thatStream provided.

Streams2 provided guiding principles for new developers and took greater care of backpressure, edge-cases, and data optimization.

Nonetheless, there were still a few tricks one needed to be cognizant about. Remember, Streams2 was designed to be completely backwards compatible with Streams1, which was considered an enormous success in spite of their dissimilar internal workings and interface. However, there was one small caveat.

The pull versus the push interface, choose wisely.

In Streams1, the way data was being shuttled from a source was as if it was being pushed towards its destination. In the next iteration, Streams2, data is natively called, or pulled by the consumer stream.

Another way to break it down is:

A pull stream is an asynchronous function that is repeatedly called until it says “stop!”.
A push stream is an asynchronous function that repeatedly sends data (without ever being asked) until it is closed.

Both methods of data transfer were made possible in Streams2. However, there was one problem. If you reverted back to push-streams (to reenable Classic Streams or called entering classic or old-mode) at any point in your data transfer, you were not able to switch back to the newer pull method.

In short, you were stuck in “old mode”. This required careful attention from a developer, and often times caused confusion because it was not always exactly clear.

An example of Streams2 pull interface:

var Readable = require('stream').Readable;
var CassetteStream = new Readable();var tracklist = 0;

CassetteStream.on('readable', function() { tracklist++;
 
 var track; 
  while (track = CassetteStream.read()) {
    walkman += track;
  }
}// tracklist: 1

Whereas, Classic Stream’s push interface:

var Readable = require('stream').Readable;
var CassetteStream = new Readable();var tracklist = 0;
CassetteStream.on('data', function(track) {
  
  tracklist++;  if (track)  walkman += track;
}// tracklist: 24

This fixed mode was a problem if you ever wanted to debug your application and observe the state of your data while streaming:

// WARNING: Not possible in versions 0.4 - 0.11.0 (streams2)
var ClassifiedStream = fs.createReadableStream('secrets.docx');ClassifiedStream.on('readable', function() {  var sensitiveData;
  while ( sensitiveData = ClassifiedStream.read() ) {
    data += blackout( sensitiveData );
  }
}ClassifiedStream.on('data', function(chunk) {
  console.log('Just one last peek ... ', chunk);
}ClassifiedStream.pipe(ShredderStream);

Streams3

Dist: https://nodejs.org/dist/v0.11.5
Doc: https://nodejs.org/docs/v0.11.5/api/stream.html

Streams3 appeared as proposed as a solution to the permanent consequences of entering old mode. It allowed the .data() and .readable() events to exist simultaneously, giving the option to observe your data while you were testing and building an application.

How Streams3 solved this was by introducing a passive data listener. The passive data listener was part of the push interface (from the Classic Streams); but as opposed to definitely relying on it as a means to obtain data, you could use it regardless of scenario.

With the existence of a passive listener, it did not require the developer to decide between which mode to enter. It effectively removed the notion of the old and current modes completely.

Streams3 is the current design implementation (LTS version v6). As far as I know, any thought of Streams4 is far out of mind. There are many reasons for that, but mostly because it’s extremely difficult to change an integral part of Node.js without lots and lots of forethought and planning.

Remember, thousands of packages depend on Stream to work in an expected manner, and one of the biggest hurdles of upgrading any module is ensuring it does not break older versions of the code.

If you are worried about backwards compatibility and browser usage, an essential library is readable-stream. There is an excellent writeup on it here by Rod Vagg.

In short, the package provides a type of graceful degradation for Readable streams, allowing your readable stream to work between all versions of Node, and providing support for different browser versions.

While that may be it for the history of Streams in Node core of present date, if you’re fascinated by the reimagining of technology, the next section explores a library Dominic Tarr (who has written a wonderful, technically-minded, and front-lines perspective of the history of streams) has developed, a streams library that is callback-based named: pull-streams.

Pull-Streams: Callback streams

pull-stream/pull-stream

pull-stream — minimal streams

github.com

Pull-streams are a reinvention of streams (and do not exist in Node core, but can be used as an external package with Node.js and connect to the standard Stream). Above is a library of patterns and method abstractions for streaming. It’s important to note pull-streams do not rely on the EventEmitter object at all and are entirely callback-based.

There have been critiques from developers that Node’s core streams have become too complex from its abstractions, requiring great technical knowledge to discern the role of each function, and moreover, one function hinging on another. It can be easy to become lost in the codebase of streams.

If you share the same viewpoint, pull-streams might be a good, minimalistic solution for your Node.js package.

As opposed to emitting events every time something new happens, the functionality of pull-streams are lightweight and even further hone in on the concept of singular functionality. As soon as the data begins its journey from one source to another, a pull-stream prunes all the excess and delivers the data in a minimal manner.

Thus, there is no such thing as flowing or paused mode, and data doesn’t necessarily buffer. The way data is read is entirely through callbacks.

Here is a great workshop that helps to demonstrate and learn the pattern paradigm.

Pull-streams suggest for us to reimagine streams not as a module or as a function for data shuttling, but as something that transcends spatial linearity and situated, rather, in the dimension of time.

Eloquently, this was written in the pull-streams documentation:

There is a deeper, platonic abstraction, where a stream is just an array in time, instead of in space. And all the various streaming “abstractions” are just crude implementations of this abstract idea.

Basically if we thought about how object A reaches its destination, we might say: I will move object A to destination Z.

In pull-streams, the agent that shuttles object A to destination Z is not a kinetic force, but rather the progression of time itself. The agent, in this case, is what we can think of as a stream.

Pull-streams interprets this modal of streaming with javascript’s native callback methodology. This provides an ecosystem that is both minimalistic and compatible back to the oldest versions of javascript.

Callbacks are not dissimilar to the Promises object, which are described as a way of communication based in the dimension of time:

The Promise object is used for asynchronous computations. A Promise represents a value which may be available now, or in the future, or never.

This article won’t go any further into callbacks or promises, but here is an excellent talk on the history of promises that dives into the origin and theory of how promises were developed.

Conclusion

Node.js provides an interface for developers to build connection links between different libraries and modules. As you have learned, from version to version, there are specific tricks and corners to watch out for when building your own streams.

We have seen how streams have evolved over the years to become more efficient, and easier to implement. And while, most of the ways we think about streams are mechanical and event-based, we know there is a different possibility out there in pull-streams.

I hope this series helped to formulate a broadened understanding of streams — from a practical, historical, and finally, metaphysical point of view.

Thanks for reading, and keep on streaming on!

Finally, one ginormous and warm shoutout to Myles Borins and Matteo Collina for fact-checking and going over this piece with me. You guys rock.