Uploading and retrieving a file from GridFS using Multer
When dealing with large files, such as videos, or large images, storing these can be challenge. MongoDB provides a great specification, GridFS, which allows to store files larger than 16MB directly in the database. It’s important to understand how GridFS actually stores data, especially if you plan to use it to manage your files.
GridFS uses two collections to store file data. The default names are fs.files and fs.chunks. The first collection holds the metadata of the file, including name, size, content type etc. The second collection is where the magic happens. The fs.chunks collection holds the actual file, which has been broken into chunks, default size 255kb. The chunks are stored as separate documents, that are numbered. The MongoDB driver you are using will assemble these chunks in order, when you want to retrieve the file.
There are many advantages to using GridFS, but only if your application requires the unique features it offers you. For basic small files, you can simply store the files on disk, which is a easy File System read and write. Read more about the advantages of GridFS here.
GRIDFS COLLECTIONS
Let’s take a look at the information stored within the two collections, fs.files and fs.chunks.
{
"fs.files": {
"_id" : "<ObjectId>",
"length" : "<num>",
"chunkSize" : "<num>",
"uploadDate" : "<timestamp>",
"md5" : "<hash>",
"filename" : "<string>",
"contentType" : "<string>",
"aliases" : "<string array>",
"metadata" : "<any>",
},
"fs.chunks" :{
"_id" : "<ObjectId>",
"files_id" : "<ObjectId>",
"n" : "<num>",
"data" : "<binary>"
}
}
fs.files contains information such as length of the file, filename, content type and even an MD5 hash of the file.
Fs.chunks, on the other hands, contains an important property “files_id” which is the link to the file in the fs.files collection the chunk belongs to. “n” is another important field as it holds the number of the chunk in the sequence.
OUR APPLICATION
I’m using a basic Node/Express application, with a Pug front to upload and retrieve a single file. In order to achieve this I’m using the multer-gridfs-storage package, which let’s me use multer to upload a file directly into the GridFS storage. You can view the entire project at https://github.com/kvnam/multer-gridfs-storage-demo.
const MongoClient = require('mongodb');
const multer = require('multer');
const GridFsStorage = require('multer-gridfs-storage') //I used an mlab Sandbox DB. Substitute the details with your own
const url = "mongodb://<dbuser>:<dbpwd>mlaburl:12435/your_db_name";
const dbName = "your_db_name"; let storage = new GridFsStorage({
url: "mongodb://<dbuser>:<dbpwd>@mlaburl:12435/your_db_name",
file: (req, file) => {
return {
bucketName: 'test',
//Setting collection name, default name is fs
filename: file.originalname
//Setting file name to original name of file
}
}
}); let upload = null; storage.on('connection', (db) => {
//Setting up upload for a single file
upload = multer({
storage: storage }).single('file1');
}); module.exports.uploadFile = (req, res) => {
upload(req, res, (err) => {
if(err){
return res.render('index', {title: 'Uploaded Error', message: 'File could not be uploaded', error: err});
}
res.render('index', {
title: 'Uploaded',
message: `File ${req.file.filename} has been uploaded!`});
});
};
In order to upload the file, I first set up my multer upload function with the GridFS storage. This is where the multer-gridfs-storage package comes in. It allows me to not only set the mongo DB I want to use as my storage, it also allows
- Naming the collection — I set it to ‘test’
- Naming the file — I set the file name to the original name of the file
- Enable caching
- Listen for events — I listen for the ‘connection’ event to set up my upload multer function
After the Grid FS storage is set, the upload process is exactly the same as any file upload with the multer package. This code will store the file in test.files and test.chunks collections.
Now let’s move on to retrieving the file. For my application I am accepting the filename from the user, and then using it to search the test.files collection for the right ID to retrieve the chunks from test.chunks.
module.exports.getFile = (req, res) => {
//Accepting user input directly is very insecure and should
//never be allowed in a production app.
//Sanitize the input before accepting it
//This is for demonstration purposes only
let fileName = req.body.text1;
//Connect to the MongoDB client
MongoClient.connect(url, function(err, client){
if(err){
return res.render('index',
{
title: 'Uploaded Error',
message: 'MongoClient Connection error', error: err.errMsg});
}
const db = client.db(dbName);
const collection = db.collection('test.files');
const collectionChunks = db.collection('test.chunks');collection.find({filename: fileName}).toArray(function(err, docs){
if(err){
return res.render('index', {
title: 'File error',
message: 'Error finding file',
error: err.errMsg});
}
if(!docs || docs.length === 0){
return res.render('index', {
title: 'Download Error',
message: 'No file found'});
}else{
//Retrieving the chunks from the db
collectionChunks.find({files_id : docs[0]._id})
.sort({n: 1}).toArray(function(err, chunks){
if(err){
return res.render('index', {
title: 'Download Error',
message: 'Error retrieving chunks',
error: err.errmsg});
}
if(!chunks || chunks.length === 0){
//No data found
return res.render('index', {
title: 'Download Error',
message: 'No data found'});
}
let fileData = [];
for(let i=0; i<chunks.length;i++){
//This is in Binary JSON or BSON format, which is stored
//in fileData array in base64 endocoded string format
fileData.push(chunks[i].data.toString('base64'));
}
//Display the chunks using the data URI format
let finalFile = 'data:' + docs[0].contentType + ';base64,'
+ fileData.join('');
res.render('imageView', {
title: 'Image File',
message: 'Image loaded from MongoDB GridFS',
imgurl: finalFile});
});
}
});
});
};
In order to connect to the Mongo database, I’m using the MongoDb driver for Node.js. This allows me to query the database without utilizing schemas. The main thing to note here is the Mongo stores the chunks of data in BSON, or Binary JSON format. So in order to display this as an image, I have to take an additional step of converting the data into base64-encoded strings, which are then displayed using the data URI format.
I’m using the file metadata retrieved from test.files, to identify content type, to ensure my data URI is set correctly. The length field etc are useful when you plan to send partial content.
You can build on this easily to create a much more complex, and secure application that allows the easy upload and download of large files. The entire application, complete with a Pug front end can be found here https://github.com/kvnam/multer-gridfs-storage-demo.
Join me next week as we move on to streaming video using the gridfs-stream package. If you like this article, then give me a follow on Twitter or LinkedIn for updates. If you have any more suggestions or contributions drop me an email at kavitanambissan@gmail.com, I’m always looking to learn more!