The Node.js file system module — Reading from a file

Chaya Danzinger
6 min readSep 18, 2018

--

Background

The fs (file system) module acts as an API which allows us to interact with the file system in our code. All the fs operations have synchronous and asynchronous versions, providing us with more flexibility and control related the order and way in which we want the operations to be performed. Because synchronous versions block all processes until they are done, the use of asynchronous versions is recommended for processes where you don't want to block incoming connections or other processes.

Asynchronous versions of the fs operations will always take in a 'completion callback (function)' as the last argument. The arguments passed into the callback will depend on the original operation (and the values it returns upon completion), with the first argument always being reserved for a value representing an error or exception.

fs.open(path.join(__dirname, 'some_file.txt'), 'r', (err, fd) => {             
if (err) return console.log('encountered error...\n' + err);
// ... do something
fs.close(fd, (err) => {
if (err) return console.log('encountered error...\n' + err);
console.log('file successfully opened and closed!');
});
});

As explained above, you can see how the open operation takes in an anonymous callback function as its last argument, with the first argument to the callback function being a value representing a possible error or exception (or null if no error), and the second argument to the callback function being an (optional) return value from the original operation. In this case, the original operation was open which returned a file descriptor, and because we want to use the file descriptor within our callback function (with the close operation), that is what we pass in as the second argument.

note: we use the path module instead of manually typing / or \ in specifying the path to file in order to avoid problems that might come up related to platform differences and how path names get interpreted.

Usage

To use the Node.js fs module, type const fs = require('fs'); before your code to obtain access to its methods and classes through the fs (or whatever name given) object variable.

Reading from a file

The file system module provides a number of operations you can choose from to read from a file. The two most seemingly common ways are fs.readFile(path[, options], callback) and fs.read(fd, buffer, offset, length, position, callback).

fs.readFile(path[, options], callback)

fs.readFile() is an extremely useful operation when you are working with files that are not too large, and when you need to retrieve the entire contents of the file at once. The documentation states that the options argument can be any file system flag, or a string specifying the encoding, and that if neither is specified, a raw buffer is returned:

fs.readFile(path.join(__dirname, 'file.txt'), (err, data) => {
if (err) throw err;
console.log(data);
});

returns

<Buffer 23 20 54 68 65 20 4e 6f 64 65 2e 6a 73 20 66 69 6c 65 20 73 79 73 74 65 6d 20 6d 6f 64 75 6c 65 20 2d 20 52 65 61 64 69 6e 67 20 66 72 6f 6d 20 61 20 ... >

You can see how I did not specify any encoding or system flag, so it’s just displaying the raw buffer. In this case, typeof data returns object.

Note that doing console.log(data.toString()) will then print the raw buffer object as a string.

If I pass in the encoding as a string, then typeof data will return string. That means doing this:

fs.readFile(path.join(__dirname, 'file.txt'), 'utf-8', (err, data) => {
if (err) throw err;
console.log(data);
});

will print the actual string data, which was confirmed when I ran the code.

I ran into a problem when testing the operator with passing in a file system flag for option as oppose to a string specifying the encoding. The documentation states that the default option file system flag is 'r', but when I tried it by passing in an option for a different file system flag (via string like 'r+', or with O_RDWR via file open constants), the program exited with an error throw new Error(Unknown encoding: ${encoding});. I did not do much digging on the readFile function, so it could very well be that I am misunderstanding the documentation for the option parameter and need some clarification (please comment and explain if that's the case), or maybe the documentation could use a slight edit.

EDIT* to pass in a flag option, you must pass it in as an object like so:
fs.readFile(path.join(__dirname, 'file.txt'), {encoding: "utf-8", flag: "r+"}, (err,data) => {...});

The callback is passed two arguments, where data is the contents of the file.

There is also an asynchronous version of readFile(), namely fs.readFileSync(path[, options]), but it blocks other processes in the program until it completes reading, and should only be used for that intended purpose.

fs.read(fd, buffer, offset, length, position, callback)

fs.read() takes in fd, a file descriptor, requiring you to open the file before using this method, and close the file when reading is complete. In addition, after reading the contents into memory, it writes the data directly into buffer, requiring that you have a buffer as big as whatever amount of data you want to read: var buf = new Buffer(1024);.

The callback is given (err, bytesRead, buffer), although if you declared your buffer globally then you can access it without the third buffer argument.

Although fs.read() and fs.readFile() are 'asynchronous', they both appear to operate in such a way where the contents of the file are read into memory before processing it in whatever way you specify.

While you can indicate and start and end point with fs.read(), the contents are still read into memory like with fs.readFile(). With large files this will consume a lot of your computer's memory and resource, drastically slowing down the process or causing it to not work altogether.

To avoid this problem and minimize memory costs, you can instead read a file through ‘streaming’ via fs.createReadStream().

Readable Stream

Streams allow us to keep the data held in memory to a minimum, enhancing performance. Streams operate in such a way where they read the file in specific chunk sizes (see here for more details), so long as there exists a mechanism of consuming or ignoring the data. A great demonstration on the difference in the amount of time it takes to ‘stream’ a file chunk by chunk (without the contents getting buffered into memory at all), vs using readFile() can be found here, along with a great explanation on Node.js streams.

Duncan Grant also provides a good demonstration on how we can use streams to control which parts of the file we want to process, further enhancing performance.

Writing data to or consuming data from a stream generally does not require the implementation of the stream interface so there is generally no reason to have to call require('stream').

fs.createReadStream(path[, options])

createReadStream() can take in a number of optional arguments:

  • flags <string> Default: 'r'
  • encoding <string> Default: null
  • fd <integer> Default: null
  • mode <integer> Default: 0o666
  • autoClose <boolean> Default: true
  • start <integer>
  • end <integer> Default: Infinity
  • highWaterMark <integer> Default: 64 * 1024

start and end are both inclusive and start counting at 0. if fd is specified, ReadStream will ignore the path argument and use the file descriptor (no open event will be emitted). If you are passing in fd, it should be blocking (i.e use fs.openSync(). I'm not sure why, but haven't dug into this).

Readable streams will buffer an amount of data based on the highWaterMark option, which usually specifies a number of bytes. When the highWaterMark threshold is reached for a read, the stream temporarily stops reading until the data is somehow consumed.

There are two main reading modes, but the easiest and most recommended way to stream (unless you require more control over the data operations) is through the readable.pipe(destination[, options]) method, which takes in a destination stream (stream.Writable) and 'pipes' the data from the readable stream through to the writable stream. The pipe method returns the stream.Writable destination, allowing for chaining of pipes.

const readable = fs.createReadStream(path.join(__dirname, 'some_file.txt'));
const writable = fs.createWriteStream('output_file.txt');
// All the data from readable, read and written in chunks of the
// default highWaterMark size, gets written into 'file.txt' readable.pipe(writable);

Note: fs.createReadStream() streams an amount of data into the internal buffer. pipe() pulls all the data from the internal buffer and passes it through to a writable destination. Another way to pull data out of the internal buffer is to use the readable.read() method, which takes in an optional size argument specifying the number of bytes to pull from the internal buffer, and returns the data requested. If no size is specified, then the all the data contained in the buffer gets returned.

If you want to know more, check out the detailed Node.js documentation on Readable Streams and the File System module.

Originally published at gist.github.com.

--

--