An Introduction to GridFS

Geethu Suresh
Version 1
Published in
3 min readMay 27, 2022
Photo by Campaign Creators on Unsplash

MongoDB is a leading NoSQL, open-source, and document-oriented database platform. However, the maximum size of a BSON document is 16 megabytes. What do you do with files that are greater than that? That’s where GridFS comes into the picture.

What exactly is GridFS?

GridFS is a MongoDB file system abstraction that is used for storing and retrieving huge image, audio, or video files. It isn’t a feature of MongoDB, but rather a specification, and the driver exposes an endpoint for the users to store/retrieve files. What makes GridFS different from BinData storage in MongoDB, is that it can store files larger than the 16MB document size limit.

How is data stored in GridFS?

GridFS takes a file and splits it into sections called chunks. By default, each chunk size is 255 KB (this is a configurable parameter).

GridFS creates 2 collections — the chunk collection and the file collection and places them in a common bucket by prefixing each with the bucket name(the default name is fs) — fs.chunks and fs.files. The bucket is only created on the first read/write operation, if it does not exist.

The split chunks are stored as documents in the chunk collection, while the additional metadata is saved in the file collection.

Indexes are automatically created on the collections when the data is initially uploaded for efficiency and convenience. Additional indexes can also be created as needed.

How does GridFS store data

How is data fetched?

When you want to read the file stored in GridFS, the driver/client fetches the metadata from the files collection and locates and merges all of the chunks as needed. The file can be read into memory or output to a stream. You can also retrieve data from any section of the file.

Advantages

  1. One of the main advantages of chunking is that GridFS can retrieve sections of a file without loading the entire file into memory.
  2. Keep the file metadata along with the file itself.
  3. Geo-replication and availability — With GridFS and MongoDB sharding, files and metadata are automatically synced and deployed.
  4. It’s possible to store it alongside other database content, simplifying the architecture.
  5. Use MongoDB’s built-in Authentication and security mechanisms.
  6. There is no need to worry about the file system limitations.

Disadvantages

  1. Slower performance compared to file system or serving the file from a server.
  2. It is not feasible to make atomic changes to the file. Alternative would be to create multiple versions of the file and trash the ones that aren’t needed.

Head on to the next post to learn how to store and retrieve files using GridFS!

About the Author:
Geethu Suresh is a Microsoft .Net Consultant here at Version 1.

--

--

Geethu Suresh
Version 1

A software engineer who enjoys meaningful conversations over a cup of coffee!