Node.js Streams by Example
Streams are a fundamental construct in Node.js given its focus on event driven processing of data through IO channels. Understanding how they can be used to read, write and transform data is key to writing memory and CPU efficient data processing applications using Node.js.
Streams in Node.js come in four flavours:
- Readable streams
- Writable streams
- Duplex streams (both readable and writable)
- Transform streams
Streams are implemented using a standard set of interfaces comprised of functions and events. Many objects provided as part of the Node.js standard library implement the stream interfaces. For example, there are objects in the “fs” package for reading from and writing to files using the stream interfaces. Every HTTP request is a readable stream where responses are writable streams.
Reading from a stream
Getting data out of a stream is a case of listening to the “data” event, which is passed chunks of data while there is data to read from the stream. Once all data has been read from the stream it will emit the “end” event.
The following example reads from a file stream, counts the number of words in the stream and then outputs the result once there is no more data left in the stream. The example assumes the input data is “utf8” encoded but this will not always be the case.
The callback function invoked when the “data” event is emitted splits the data chunk by one or more whitespace characters to extract the words ignoring any whitespace. Because line four species an encoding value of “utf8” each data chunk will be a String object. If no encoding is specified then “data” will be a Buffer instance and manual conversion of the data would be required.
Transforming data from one stream to another
The “read” example is all well and good but what if we wanted to do something more advanced like modify the data flowing between two streams? This is possible using the Transform interface.
The following example shows a custom implementation of streams.Transform that takes input and base64 encodes it. The Base64Encoder is inserted into a pipeline through the use of the pipe() function. Stdin is the beginning of the pipe() (or the source) to which Base64Encoder is appended. Lastly Stdout is added as the final piece of the pipe. This means that any input read from Stdin will flow through to the Base64Encoder and the transformed output will then pass to Stdout and out to the console.
Base64Encoder must do three things to achieve its goal:
- Inherit from streams.Transform.
- Invoke the Transform constructor (line 5).
- Provide an implementation of the _transform() function.
This has been a brief but I hope useful introduction to streams in Node.js. I hope that you agree they are hugely powerful and valuable constructs that should be a regular part of your day-to-day Node.js programming. Please try to use streams where you can especially when there is a possibility of reading and writing large amounts of data. Nobody wants to be called at 4am on a Saturday because your Node.js app received a GB upload which it tried to process in one go! ☺