Mission Possible: Resize MongoDB Capped Collections Without Downtime

Why our MongoDB had almost no available space on HD and how we could avert the imminent disaster without downtime.

Kay Agahd
idealo Tech Blog
8 min readAug 6, 2020

--

TL;DR MongoDB supports capped collections with a specified maximal size but once created, you can’t resize it without downtime. Here’s how we got around it.

Recap: Capped Collections

As written in MongoDB’s documentation:

Capped collections are fixed-size collections that support high-throughput operations that insert and retrieve documents based on insertion order. Capped collections work in a way similar to circular buffers: once a collection fills its allocated space, it makes room for new documents by overwriting the oldest documents in the collection.

Capped collections are (nearly) perfect to store documents for a certain period or a well-defined quantity. Neither you nor MongoDB needs to actively delete the oldest documents. They will just be overwritten by the newest ones, once the capped collection reaches its max size.
In this regard, they are superior to TimeToLive (TTL) indexes because capped collections don’t require an index in order to find and eventually delete the oldest documents. Capped collections also don’t require to scan the whole collection for “outdated” documents as TTL indexes do once per minute. Thus, the database needs less RAM and fewer CPU cycles which increases the general database performance.

Speaking of performance, if you need to visually spot and analyze slow MongoDB operations, I suggest you to checkout idealo’s MongoDB slow-operations-profiler, that has been open-sourced on github.com. I’ve also written about it in another story.

Our Capped Collection

We need to store log data for at least 90 days, the longer the retention time the better. We are using three dedicated bare-metal servers in a replica set for this purpose. Each server has 376 GB RAM, 56 CPU cores, and 5.8 TB SSD’s in Raid 10.

The Max Size Problem

Since we wanted to store the maximum quantity of log data, we wanted to limit the capped collection at 80% of our HD. Naively one would use the following command to create the capped collection, given that 5101733952992 Bytes are 80% of 5.8 TB.

db.createCollection("offerMapping", {capped:true, size:5101733952992})

The problem is that size defines the maximum size of uncompressed data but MongoDB stores data compressed on disk for many years already.

Historically Grown

MongoDB’s first and only storage engine MMAP stores data uncompressed on disk, so the size parameter made sense at that time.
However, the current storage engine WiredTiger, stores data compressed on disk. WiredTiger has been introduced to MongoDB in version 3.0 already and became the default storage engine since version 3.2, published in 2015.
The MMAP storage engine has been deprecated since version 4.0 which was published in 2018.

Today’s Problem

In order to create the capped collection with the correct size, you need to estimate how good WiredTiger can compress your data.
Around one year ago, we made such tests and calculated that MongoDB could compress up to 26 TB of uncompressed data to fill 80% of our 5.8 TB hard disk. Naively one would use the following command to create the capped collection, given that 28587302322800 Bytes are 26 TB.

db.createCollection("offerMapping",{capped:true, size:28587302322800})

What’s wrong with this?
Well, even if this capped collection is the only collection on the server, you still have to add the sizes of the oplog and indexes. Both are stored compressed on disk by WiredTiger.
Even though we calculated all these parameters (almost) correctly, our server ran almost out of available disk space.

How could this happen?
This could happen because the content of the documents stored in the collection became less compressible within time. For the same reason, the index sizes increased.
Both led to the fact that the HD would have had almost no space left.

That’s the reason why it makes no sense to indicate the size of uncompressed data for a capped collection while the default MongoDB storage engine WiredTiger stores data compressed on disk.

I’ve created a feature request at mongodb.org with a high priority. A short time later it has been downgraded to only Minor priority by MongoDB staff which means that it will probably never be implemented regarding all the “minor” feature requests that are older than 10 years already.
I’d be glad if you could vote for it!

It Could Be So Easy

A workaround would be to simply resize the capped collection. That’s already possible since MongoDB version 3.6 - but only for the oplog. The oplog is also just a capped collection, so I wonder, why MongoDB engineers did not implement this feature in general for all capped collections. Do they think that’s good enough to guess the max size of compressed data, to know how well WiredTiger may compress the documents now and in the future?

As of today, there is no documented way to resize capped collections without downtime even though such unresolved feature requests exist for 10 years already, (e.g. SERVER-1864, which has also a priority of only Minor).
The standard way to resize a capped collection is by creating a new capped collection with the right max size, stop all writing database clients, copy all documents from the old to the new collection, restart database clients and finally delete the old collection.

In our case, we would have to copy more than 25 TB of uncompressed data! Do the engineers at MongoDB really want us to be offline for that long?

Our Workaround

Initially, we naively tried to create a smaller capped collection with the same name on one of the replica set members, which we restarted in maintenance mode as a standalone server. Then we wanted to mongodump the last 90 days of log data from the replica set and pipe it to mongorestore to the standalone server. The last step would have been to add the standalone server back to the replica set.
This did not work because MongoDB uses since version 3.6 immutable UUID’s to identify its collections internally. The collection UUID remains the same across all members of a replica set.
Unfortunately, the createCollection command does not support passing a UUID, so the collection will be created with a random UUID which differs from the collection UUID of the other replica set members. Once the standalone member comes back to join the replica set, MongoDB refuses to replicate due to the wrong collection UUID and shuts down the newly added server.
Thus, mongodump and mongorestore to the rescue:

When mongodump writes data to disk, it creates for each collection a collectionName.metadata.json file that belongs to a collection named collectionName. Such a file looks like this:

{"options":{"capped":true,"size":{"$numberLong":"28587302322800"}},"indexes":[{"v":2,"key":{"_id":1},"name":"_id_","ns":"changelog.offerMapping"},{"v":2,"key":{"offerId":1.0},"name":"offerId_1","ns":"changelog.offerMapping"}],"uuid":"fbb83cda241d45779dc88983351d5447"}

As you can see, besides the indexes that will be recreated after having restored the collection, there is also the size and the UUID of the collection. We can modify the size to our new size.

Since I was just interested to get this metadata.json file, I added the -q parameter to mongodump to match no documents:

mongodump -h ${HOST}:${PORT} -d ${DB} -c ${COLLECTION} -u ${USER} -p ${PASS} -q '{ "_id" : "foo" }' --out /data/backup/

Once modified the size within the metadata.json file, I could restore the (empty) collection with mongorestore and its parameter --preserveUUID:

mongorestore -h ${HOST2}:${PORT2} -d ${DB} -c ${COLLECTION} -u ${USER} -p ${PASS} --drop --preserveUUID /data/backup/${DB}/${COLLECTION}.bson

Now, the new capped collection having the same UUID but having another size has been created on the standalone server. We can now dump our last 90 days of log data and restore it on the standalone server on-the-fly:

mongodump -h ${HOST}:${PORT} -d ${DB} -c ${COLLECTION} -u ${USER} -p ${PASS} --archive -q '{_id: {$gt: new ObjectId("5eaab2e00e55cebb702911e1")}}' | mongorestore -h ${HOST2}:${PORT2} -u ${USER} -p ${PASS} --maintainInsertionOrder --archive

In order to pipe mongodump to mongorestore we have to use the --archive flag, which in turn makes obsolete the -d and -c parameters of the mongorestore command.
Very important is the parameter --maintainInsertionOrder because if you omit it, mongorestore will insert documents in an arbitrary order which is dangerous for a capped collection where the first inserted documents will be the first overwritten ones when the capped collection gets full (FIFO). If the insertion order is not kept during restore, then documents to be overwritten will not be the oldest ones.

Once mongodump and mongorestore have finished, you end the maintenance period and put back the standalone server into the replica set. It will then automatically replay the oplog and comes in sync eventually.

As for us, we have reduced the capped collection to 50% of its previous size, which on the one hand is large enough to store at least the log data of the last 90 days, and on the other hand, allows enough space for a growing amount of log data per day or an even lower compression factor.
As you can see in the screenshot, only slightly more than 30% of disk space was occupied when the log data of the last 90 days was restored. Until then the capped collection continues to grow slowly until it reaches its maximum size, which, given the current compression factor, should be about 50% of the total disk capacity.

as the Disk Usage looks today

Final Steps

For the rest of the replica set members you can, of course, proceed as you did with the first one but here is an even easier way: Since one replica set member (here ${HOST2}) has already the correctly sized capped collection, you can simply use it as a donor and restore the data on the next standalone member (here ${HOST3}):

mongodump -h ${HOST2}:${PORT2} -d ${DB} -c ${COLLECTION} -u ${USER} -p ${PASS} --archive | mongorestore -h ${HOST3}:${PORT3} -u ${USER} -p ${PASS} --maintainInsertionOrder --archive --drop --preserveUUID

If you do it this way, it’s important that mongorestore uses the --drop flag to drop the still existing old capped collection. This time you need also to add the flag --preserveUUID to preserve the collection UUID when mongorestore creates the new capped collection on the standalone server.

Did You Say ObjectId?

You may have wondered that I used ObjectId to retrieve the last 90 days of documents. Our documents have a timestamp field but it’s not indexed so it would take very long to query it.
However, our primary key _id of the capped collection is an ObjectId which contains the timestamp in its first 4 Bytes. Knowing this, we can calculate in the mongoshell an ObjectId that corresponds to a date 90 days ago:

> var now = new Date()
> var d = new Date(now.getTime()-(90*24*60*60*1000));
> var dhex = (Math.round(d.getTime()/1000)).toString(16)
> var oid = new ObjectId(dhex + "0000000000000000")

Then we need a second ObjectId that differs a bit, say one hour, because it’s unlikely that the first calculatedObjectId exists due to the added padding of zeros.

> var d2 = new Date(d.getTime()+(60*60*1000));
> var d2hex = (Math.round(d2.getTime()/1000)).toString(16)
> var oid2 = new ObjectId(d2hex + "0000000000000000")

With these two ObjectId's we can get a document which has been created within the time span of both of them:

> db.offerMapping.find({_id:{$gt:oid, $lt:oid2}}, {_id:1}).limit(1)
{"_id" : ObjectId("5eaab2e00e55cebb702911e1")}

And that’s the _id I used in my query to get the last 90 days of documents.

And that’s it! I hope you enjoyed reading and gained some new helpful insights!
If you found this article useful, give me some high fives 👏🏻 and share it with your friends so others can find it too. Follow me here on Medium (Kay Agahd) to stay up-to-date with my work. Thanks for reading!
Btw. idealo is hiring: Check out our vacancies here.

--

--