The Node.js file system module — Reading from a file
Background
The fs
(file system) module acts as an API which allows us to interact with the file system in our code. All the fs
operations have synchronous and asynchronous versions, providing us with more flexibility and control related the order and way in which we want the operations to be performed. Because synchronous versions block all processes until they are done, the use of asynchronous versions is recommended for processes where you don't want to block incoming connections or other processes.
Asynchronous versions of the fs
operations will always take in a 'completion callback (function)' as the last argument. The arguments passed into the callback will depend on the original operation (and the values it returns upon completion), with the first argument always being reserved for a value representing an error or exception.
fs.open(path.join(__dirname, 'some_file.txt'), 'r', (err, fd) => {
if (err) return console.log('encountered error...\n' + err);
// ... do something
fs.close(fd, (err) => {
if (err) return console.log('encountered error...\n' + err);
console.log('file successfully opened and closed!');
});
});
As explained above, you can see how the open
operation takes in an anonymous callback function as its last argument, with the first argument to the callback function being a value representing a possible error or exception (or null
if no error), and the second argument to the callback function being an (optional) return value from the original operation. In this case, the original operation was open
which returned a file descriptor, and because we want to use the file descriptor within our callback function (with the close
operation), that is what we pass in as the second argument.
note: we use the
path
module instead of manually typing/
or\
in specifying the path to file in order to avoid problems that might come up related to platform differences and how path names get interpreted.
Usage
To use the Node.js fs
module, type const fs = require('fs');
before your code to obtain access to its methods and classes through the fs
(or whatever name given) object variable.
Reading from a file
The file system module provides a number of operations you can choose from to read from a file. The two most seemingly common ways are fs.readFile(path[, options], callback)
and fs.read(fd, buffer, offset, length, position, callback)
.
fs.readFile(path[, options], callback)
fs.readFile()
is an extremely useful operation when you are working with files that are not too large, and when you need to retrieve the entire contents of the file at once. The documentation states that the options
argument can be any file system flag, or a string specifying the encoding, and that if neither is specified, a raw buffer is returned:
fs.readFile(path.join(__dirname, 'file.txt'), (err, data) => {
if (err) throw err;
console.log(data);
});
returns
<Buffer 23 20 54 68 65 20 4e 6f 64 65 2e 6a 73 20 66 69 6c 65 20 73 79 73 74 65 6d 20 6d 6f 64 75 6c 65 20 2d 20 52 65 61 64 69 6e 67 20 66 72 6f 6d 20 61 20 ... >
You can see how I did not specify any encoding or system flag, so it’s just displaying the raw buffer. In this case, typeof data
returns object
.
Note that doing
console.log(data.toString())
will then print the raw buffer object as a string.
If I pass in the encoding as a string, then typeof data
will return string
. That means doing this:
fs.readFile(path.join(__dirname, 'file.txt'), 'utf-8', (err, data) => {
if (err) throw err;
console.log(data);
});
will print the actual string data, which was confirmed when I ran the code.
I ran into a problem when testing the operator with passing in a file system flag for
option
as oppose to a string specifying the encoding. The documentation states that the defaultoption
file system flag is 'r', but when I tried it by passing in an option for a different file system flag (via string like'r+'
, or withO_RDWR
via file open constants), the program exited with an errorthrow new Error(Unknown encoding: ${encoding});
. I did not do much digging on thereadFile
function, so it could very well be that I am misunderstanding the documentation for theoption
parameter and need some clarification (please comment and explain if that's the case), or maybe the documentation could use a slight edit.EDIT* to pass in a flag option, you must pass it in as an object like so:
fs.readFile(path.join(__dirname, 'file.txt'), {encoding: "utf-8", flag: "r+"}, (err,data) => {...});
The callback is passed two arguments, where data
is the contents of the file.
There is also an asynchronous version of readFile()
, namely fs.readFileSync(path[, options])
, but it blocks other processes in the program until it completes reading, and should only be used for that intended purpose.
fs.read(fd, buffer, offset, length, position, callback)
fs.read()
takes in fd
, a file descriptor, requiring you to open
the file before using this method, and close
the file when reading is complete. In addition, after reading the contents into memory, it writes the data directly into buffer
, requiring that you have a buffer as big as whatever amount of data you want to read: var buf = new Buffer(1024);
.
The callback is given (err, bytesRead, buffer)
, although if you declared your buffer globally then you can access it without the third buffer
argument.
Although fs.read()
and fs.readFile()
are 'asynchronous', they both appear to operate in such a way where the contents of the file are read into memory before processing it in whatever way you specify.
While you can indicate and start
and end
point with fs.read()
, the contents are still read into memory like with fs.readFile()
. With large files this will consume a lot of your computer's memory and resource, drastically slowing down the process or causing it to not work altogether.
To avoid this problem and minimize memory costs, you can instead read a file through ‘streaming’ via fs.createReadStream()
.
Readable Stream
Streams allow us to keep the data held in memory to a minimum, enhancing performance. Streams operate in such a way where they read the file in specific chunk sizes (see here for more details), so long as there exists a mechanism of consuming or ignoring the data. A great demonstration on the difference in the amount of time it takes to ‘stream’ a file chunk by chunk (without the contents getting buffered into memory at all), vs using readFile()
can be found here, along with a great explanation on Node.js streams.
Duncan Grant also provides a good demonstration on how we can use streams to control which parts of the file we want to process, further enhancing performance.
Writing data to or consuming data from a stream generally does not require the implementation of the stream interface so there is generally no reason to have to call
require('stream')
.
fs.createReadStream(path[, options])
createReadStream()
can take in a number of optional arguments:
flags <string>
Default:'r'
encoding <string>
Default:null
fd <integer>
Default:null
mode <integer>
Default:0o666
autoClose <boolean>
Default:true
start <integer>
end <integer>
Default:Infinity
highWaterMark <integer>
Default:64 * 1024
start
and end
are both inclusive and start counting at 0. if fd
is specified, ReadStream
will ignore the path
argument and use the file descriptor (no open
event will be emitted). If you are passing in fd
, it should be blocking (i.e use fs.openSync()
. I'm not sure why, but haven't dug into this).
Readable streams will buffer an amount of data based on the highWaterMark
option, which usually specifies a number of bytes. When the highWaterMark
threshold is reached for a read, the stream temporarily stops reading until the data is somehow consumed.
There are two main reading modes, but the easiest and most recommended way to stream (unless you require more control over the data operations) is through the readable.pipe(destination[, options])
method, which takes in a destination stream (stream.Writable
) and 'pipes' the data from the readable stream through to the writable stream. The pipe
method returns the stream.Writable
destination, allowing for chaining of pipes.
const readable = fs.createReadStream(path.join(__dirname, 'some_file.txt'));
const writable = fs.createWriteStream('output_file.txt');
// All the data from readable, read and written in chunks of the
// default highWaterMark size, gets written into 'file.txt' readable.pipe(writable);
Note:
fs.createReadStream()
streams an amount of data into the internal buffer.pipe()
pulls all the data from the internal buffer and passes it through to a writable destination. Another way to pull data out of the internal buffer is to use thereadable.read()
method, which takes in an optionalsize
argument specifying the number of bytes to pull from the internal buffer, and returns the data requested. If nosize
is specified, then the all the data contained in the buffer gets returned.
If you want to know more, check out the detailed Node.js documentation on Readable Streams and the File System module.
Originally published at gist.github.com.