Clean MongoDB indexes


Storify is more than four years old now. That means our code base has been growing a lot. Unfortunately, we don’t spend enough time cleaning the code whenever we decide to update a sunset a feature.

It’s the same problem with database indexes. Over the past four years, we had to use more MongoDB indexes to respond to the new features built.

And we ended up with a lot of them.

Too many indexes is bad

There are many drawbacks having too many indexes:

  • MongoDB needs more memory to keep these indexes easily accessible.
  • MongoDB might choose the wrong index for a query if there are many competing indexes.
  • Writes get slower.

Cleaning

Two weeks ago, we started the process of evaluating the existing indexes.

New instance as playground

The strategy was to add a new instance to the MongoDB replica as a playground for our experiments (Fig 2.).

A new MongoDB instance as playground

When we wanted to modify the indexes, we isolated that instance from the replica set (Fig 3.), built/dropped an index, added it back to the replica set (Fig 4.), and then checked the faults and slow queries.

Here are some details about this process:

  • Add a new instance.
rs.add({_id: X, host: “new_ip:27017", hidden: true, priority: 0})
  • After some time, the new instance backs up the master data.
  • This procedure is then used to isolate and create/drop an index.
  • After the database is added back to the replica, we remove the `hidden` and `priority` properties so it’s been used in the replica set.
var cfg = rs.conf()
delete cfg.members[3].priority
delete cfg.members[3].hidden
rs.reconfig(cfg)
  • To check the faults:
mongostat
  • To check the slow queries, the logs are parsed:
tail -f /var/log/mongodb/mongodb.log | grep -E ‘[0–9]{3,10}ms’

Queries evaluation

Now that this procedure to test new index is clear, let me explain how we choose which indexes to add and drop.

The goal was to log any query made to MongoDB for 24 hours, then sort them by index used.

  • First, we modified the Node.js MongoDB driver to log all the queries (find and update).
  • Then we parsed the logs to extract all these queries.
  • We used our playground instance to run all the queries on, using explain (only the read queries).
  • We classified all the indexes used by counting the number of uses:
{ propA_propB_propC_-1: 123456,
propD: 78912,
propE_propB_propF_1: 12, … }
  • We also kept track of the queries slower than X seconds.

Comparing the list of the indexes used to the actual indexes of a collection helped us find which indexes were not used and thus should be dropped. Also, some indexes were used too rarely, so we evaluated the cost of dropping them by testing these queries without them.

The slowest queries helped us understand if an index was missing or if MongoDB was using a bad one.

One by one, we dropped and created indexes, and then tested the entire queries list (using the Fig 3 configuration). The comparison was done by measuring the mean/max/stdev time of the queries.

When we were happy with the changes, we used the Fig 4 configuration to test the changes with live queries.

Indexes propagation

After evaluating and cleaning all the indexes, there were two ways of propagating these new indexes:

  • Remove each mongo instance one by one and rebuild the indexes on each box
  • Create new MongoDB instances based on this new image

The second option was way faster and cleaner.

Conclusions

We ended up with half the size of the initial indexes, which translates to faster writes and less faults.

One bad thing happened when we removed the old MongoDB instances and replaced them with the new ones. One dropped index was written in our code.

This means that whenever the new primary took the lead, the application asked MongoDB to build that index. It was happening in the background, so it was mostly of ok for the primary. But as soon as the primary finished, all the secondaries started building it in the foreground, which meant that our application couldn’t reach MongoDB anymore.

You should always follow this procedure to build a new index in production.

Thanks to @jaklaassen who helped a lot figuring all of this out.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.