MongoDB GridFS, Made Simple

How GridFS works? When to use GridFS over the ordinary file system? How to use it with Node.js? and What are GridFS’s pros and cons?

Mohamed Mayallo

Published in

ILLUMINATION

4 min readAug 1, 2022

Introduction

In fact, when you come to choosing your uploading methodology, there are a lot of options you can go with. One of these options is saving your files as binary data into the database, MongoDB GridFS applies this pattern. It is a file system abstraction on top of MongoDB in which the uploaded file is divided into chunks during the uploading process and reassembled during retrieval.

How GridFS Works

Let’s represent how GridFS works in simple steps:

During the first file upload, a new bucket fs (unless you specify its name) will be created (if not exist) and this bucket consists of two collections (fs.chunks and fs.files).
A new index will be created (if not exist) in both collections for the sake of fast retrieval.
The uploaded file will be divided into chunks (by default 255KB per chunk unless you specify the chunk size) and stored in the fs.chunks collection. And to track the uploaded file portions ordering, this collection contains a field n which is the portion order.
A new metadata document will be created for the uploaded file in the fs.files collection containing its length, chunkSize, uploadedDate, filename, and contentType.
In the retrieval process, GridFS gets the file metadata from fs.files collection and uses this data to reassemble the file chunks from fs.chunks collection and return the file to the client as a stream or in memory.

When to Use GridFS over Ordinary Filesystem Storage

You can go with GridFS in these cases:

If your file size exceeds 16MB (which is the default MongoDB document size limit).
If you frequently want to access or update specific file portions without retrieving the entire file into memory.
If your file system limits the number of files in a directory, you can use GridFS to store as many files as you need.
If you want to track the metadata of your files. Which is provided as a built-in feature in GridFS.
As your files are part of your database, then your files can benefit from MongoDB’s built-in replication, backup, and sharding) features instead of handling them manually in the file system.
In fact, deleting files in GridFs is very easy as deleting an object in the database, in contrast to the file system, deleting is a bit more overwhelming.

GridFS Limitations

In fact, there is no one-fits-all solution in the world. So bare in mind these limitations:

Continuously serving big files from the database as many chunks can indeed affect your working set (A 16MB file is retrieved as 65 chunks with 255KB for each) especially if you deal with gigabytes or terabytes of data.
Serving a file from the database is a bit slower than serving it from the file system.
GridFS doesn’t natively provide a way to update the entire file atomically). So if your system frequently updates the entire file, don’t use GridFS or use a workaround as discussed below.

How to mitigate GridFS Limitations

These are some best practices when dealing with GridFS which mitigate its limitations:

To mitigate the working set consumption, you can serve your files from another MongoDB server dedicated to the GridFS storage.
Also, for the working set consumption, you can increase the chunk size instead of 255KB.
Regarding the atomic update, if your system tends to update the entire files frequently or access the files concurrently by many users, then you can use the versioning approach to track the files update. So based on your needs, you can retrieve only the latest version of the file and delete the other versions or consider them as the file’s history.

Hands-on example using Node.js

In this example, we will know how to upload, download and retrieve files from a bucket using GridFS.

I assume you are familiar with Node.js.

First of all, let’s create (if not exist) or retrieve our bucket:

Let’s upload a file using GridFS:

Bear in mind, that you can depend on the previous code to create your bucket during the first upload instead of the first step. But to guarantee the bucket creation after the database connection and having a reference to the bucket.

Let’s list our files metadata:

The find method returns a FindCursor which you can iterate through to get your result. The toArray promise replaces the cursor with an array.

To retrieve specific file metadata:

Finally, let’s download a file:

That’s it, you can find this code here in this repo.

Conclusion

At the end of the day, as we saw there is no one-size-fits-all solution, so choosing GridFS as your storage option is your decision and depends on your needs and your understanding of the pros and cons of the available options.

Resources

If you found this article useful, check out these articles as well:

Thanks a lot for staying with me up till this point. I hope you enjoy reading this article.

Originally published at https://mayallo.com.