Estimating average document size in a MongoDB collection

Shivam Shrivastava
Jul 11 · 4 min read

For one of my side gigs, I was curious to estimate what an average size of document looks like in a MongoDB collection. I have worked on numerous hobby projects, some with MySQL database and some with NoSQL. But all of them had started with 0 entries and failed to see the limelight of handling the scale of millions and billions.

I always had the intention of having an ambitious project which would handle huge databases. And, like everyone, I only wanted to invest minimum possible amount of money. Coz’ that’s what you do when you are an engineer - try to optimise on the cost. Especially when you belong from a background of Civil Engineering where Estimation and Costing is a dedicated subject! So I wanted to know how much money would a collection cost with a million documents.

MongoDB Atlas provides an automated service on Azure for 26 regions (according to what they say today) with cost of $0.28/hr for westindia (Mumbai) region. Reference — https://www.mongodb.com/cloud/atlas/azure-mongodb

This would have helped in calculating the cost had I known the data storage I would require for a million documents. And that led me to the question — “How much storage would a collection require with a million documents?”

I did what generally developers do, googled it but found nothing satisfying. Since I had never had a project with that scale, I didn’t even have a rough estimate. But I sought answer. And I did one of my favourite things — An experiment!

The Experiment

Objective

The objective was to calculate an average size of document in a collection with a million documents.

The Setup

Installed MongoDB locally on my Mac. Since I already had some hobby clojure projects, I used one of them for setting up a service that would add a million entries in the DB. The code looked like -

I connected the service to my local MongoDB instance and added entries into the ‘dummies’ collection using the function via the repl.

The Findings

I ran the experiment for collection with number of documents as 10, 100, 1000, 10,000, 100,000 and 1,000,000.

Followings were the findings of the experiment:

Sample object created —

The size of documents and collections were found by running the command db.dummies.stats() as the following —

Note — To ensure not including the stale memory in the calculation, the collection was dropped before every iteration.

The Conclusion

So, for a collection with millions of documents and document schema as mentioned above, it can be concluded that an average document size comes out to be 525 bytes with the overall collection size as ~0.5GB.

It should be noted that the document schema used in this sample was small (with only 7 fields). The documents used in actual databases contain more keys including nested documents. The average size of those documents would be more than this. Also, the index size is not included while mentioning the size of collection for this article for simplicity. Indexes may be an important consideration for databases serving millions and it may too contribute to storage size.

I hope this helps in estimating the size of documents that will be required for your application and enables you to pick an appropriate configuration for your system.