What Slowed Down the Queries? … MongoDB Chose the Wrong Index!

Why queries might become slower even though more hardware had been added to speed them up

Kay Agahd
idealo Tech Blog
9 min readOct 2, 2020

--

TL;DR We suddenly experienced a much longer query execution time even though we had added a new shard to our cluster to gain more performance. Here’s how we found the reason and solved it.

The Problem

One of our MongoDB queries took almost twice the time than usual. The execution plan revealed that 1 of the 5 shards picked another index. Was this the reason that the query response time went up? Why did the 5th shard choose another index than shards 1 to 4?

The primary key _id is also our shard key. Its content is a kind of md5 hash. However, our query did not use the shard key. So could it be that the documents on shard 5 were better suited to use another index than the other shards? Let’s find out!

Show Me Your Query and Indexes!

Our query is as follows:

db.offer.find({shopId:9701, offerStatus:1})

Only 2 fields are queried without any projection, sort, or hint. The query planner evaluated these 3 indexes to find the best-suited one:

(A) {shopId:1, missingSince:1, merchantId:1}(B) {shopId:1, offerStatus:1, merchantId:1, _id:1}(C) {offerStatus:1, missingSince:1, offerId:1, _id:1}

Shard 1 to 4 was using index A while shard 5 decided to take index B.

Recap: Query Plans

As written in MongoDB’s v4.0 (we are using v4.0) documentation:

For each query, the query planner searches the query plan cache for an entry that fits the query shape. If there are no matching entries, the query planner generates candidate plans for evaluation over a trial period. The query planner chooses a winning plan, creates a cache entry containing the winning plan, and uses it to generate the result documents.

But what are the selection criteria to determine the winning plan? I found this in MongoDB’s v4.4 (current version as of Sept. 2020) documentation:

The evaluation of the most efficient query plan is based on the number of “work units” (works) performed by the query execution plan when the query planner evaluates candidate plans.

This is only half of the truth because besides work units, other criteria such as productivity and bonus points e.g. noFetchBonus, noSortBonus, and noIxisectBonus, are taken into account to calculate the query plan score. The bonus points are used to solve tie breaks in case query plans have the same number of work units and the same productivity as I found in MongoDB’s source code. This knowledge becomes relevant later in this article.

Execution Statistics

We compared the execution statistics of the candidate plans for both indexes A and B on shard 3. I removed some insignificant lines to make it better fit here. These are the stats for candidate A (using index A):

"nReturned" : 101,
"executionTimeMillisEstimate" : 23,
"totalKeysExamined" : 112,
"totalDocsExamined" : 112,
"executionStages" : {
"stage" : "SHARDING_FILTER",
"nReturned" : 101,
"executionTimeMillisEstimate" : 23,
"works" : 112,
"advanced" : 101,
"needTime" : 11,
...
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"offerStatus" : {
"$eq" : 1
}
},
"nReturned" : 112,
"executionTimeMillisEstimate" : 23,
"works" : 112,
"advanced" : 112,
"needTime" : 0,
...
"docsExamined" : 112,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 112,
"executionTimeMillisEstimate" : 0,
"works" : 112,
"advanced" : 112,
"needTime" : 0,
...
"indexName" : "shopId_1_missingSince_1_merchantId_1",

Versus candidate B (using index B):

"nReturned" : 42,
"executionTimeMillisEstimate" : 6,
"totalKeysExamined" : 112,
"totalDocsExamined" : 42,
"executionStages" : {
"stage" : "FETCH",
"nReturned" : 42,
"executionTimeMillisEstimate" : 6,
"works" : 112,
"advanced" : 42,
"needTime" : 70,
...
"inputStage" : {
"stage" : "SHARDING_FILTER",
"nReturned" : 42,
"executionTimeMillisEstimate" : 3,
"works" : 112,
"advanced" : 42,
"needTime" : 70,
...
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 112,
"executionTimeMillisEstimate" : 1,
"works" : 112,
"advanced" : 112,
...
"indexName" : "shopId_1_offerStatus_1_merchantId_1__id_1",

On the first sight, that looks odd because even though plan B was nearly 3 times faster than plan A (6 ms versus 23 ms), plan A has been chosen. However, execution time is not relevant for MongoDB to choose its index. As I wrote earlier, the number ofwork units lets the query planner decide which index is more suitable. However, in this case, we have a draw since the number of work units is identical (112) for both plans A and B.

Which Plan To Choose?

Unfortunately, it’s not documented which query plan wins, if the number of work units is identical. However, MongoDB’s source code reveals that productivity, which is the number of advanced divided by the number of works, is also taken into account to calculate the query plan score. Since works is identical for both plans, the higher number of advanced, here plan A, gets a higher score and wins.

Why “advanced” Is Different in Both Plans?

As you can see, the number of advanced is only decremented in stage SHARDING_FILTER. It decreased from 112 to 101 in plan A and from 112 to 42 in plan B. As MongoDB’s documentation states out, SHARDING_FILTER is for filtering out orphan documents from shards.
Orphaned documents are documents that live on a shard they don’t belong to. This can happen due to unclean shutdowns of mongod or if the balancer was not able to delete chunks from the source shard that had already been copied from the source to the destination shard.
So shard 3 had orphaned documents that had to be removed from the query result by the SHARDING_FILTER stage. Obviously, the first 101 returned documents matching the shopId in index A contained fewer orphans than the first 101 documents matching both fields shopId and offerStatus in index B.

Why I Think MongoDB Is Wrong

Not only index B filters out documents that match both fields shopId and offerStatus in one go, but index B is also used in the next stage to remove orphans because the shard key _id is part of the compound index.
The fetch stage is needed in plan B only to return the matched documents because no projection was used.

However, plan A can use its index only to match the shopId field. The next stage has to fetch already all matched documents from the disk to remove those that don’t match offerStatus.
Orphans are removed in plan A only at the last stage before the already fetched documents are returned.

Knowing that fetching documents (most probably from disk) is much more resource-intensive and as such is much slower than to read the information from the index (most probably from RAM), it’s surprising that MongoDB does not honor plan B more.
Instead, Plan B was penalized only for removing more orphans than Plan A, even though plan B could use the index for that, instead to fetch the documents from the disk, like plan B.

In other words, MongoDB’s query planner honors an index that can minimize skipping index entries to match documents. Each skip decreases the advanced counter and as such decreases the score of the query plan that uses this index.

This is kind of “selectivity of an index” that Charlie Swanson, Software Engineer at MongoDB, defined in his presentation “Reading the .explain() Output”, see video recording at 24:55, as follows:

“The ratio of number of advanced or returned to number of examined

The best query plan for Charlie Swanson is (see video recording at 26:50) :

“It’ll take the one with the highest number of advances as the most efficient plan”

For most use cases this approach works fine but in cases where orphaned documents have to be removed, even by the aid of an index, it may fail to choose the best-suited index.

Why Shard 5 Got It Right?

Yes, you guessed it right already: because shard 5 doesn’t have orphaned documents!
We added shard 5 a few days ago and it seems that the balancer was not able to remove any of the copied chunks from the donor shards 1 to 4!
We even executed the cleanupOrphaned command several times. It returned after a few hours without any failure message but we still could find orphaned documents (this issue might be worth another story).
We could see in our Grafana dashboard that the number of documents was quite different between shard 5 and shards 1 to 4, even though the number of chunks was almost identical on all shards.

number of documents per shard

We added shard 5 on the 8th of September. As you can see, the balancer copied documents to shard 5 until the 13th of September but the number of documents did not decrease on the donor shards 1 to 4 until the 14th of September. The difference in the number of documents didn’t start to decrease until September 22nd when we started our own written script to remove orphaned documents. 5 days later, the number of documents was almost identical on all shards (we throttled our script to avoid performance issues while deleting orphaned documents).
Btw. the dashboard is based on estimatedDocumentCount() aka count() and not on countDocuments(). That’s why we also see orphaned documents in the graph.
Hint 1: Never call countDocuments() frequently since it’s a very resource-intensive command. Every single document has to be read from the hard disk and written to the RAM in order to sort out orphaned documents to determine a correct count result. Better clean up orphaned documents and use estimatedDocumentCount() which returns the number of documents very quickly and efficiently.
Hint 2: If you want to have a quick visual overview of slow queries, index usage statistics, and more, check out our open-sourced mongodb-slow-operations-profiler. Here’s a two-part article (one, two) I have written about.

Tiebreaker

It’s very interesting to see the candidate plans on shard 5 (which never had orphaned documents) because the above-mentioned bonus points come into play to find the best-suited query plan.
As for the other shards 1 to 4, the query planner had to choose one of the above-mentioned plans A, B, or C.
Plan C had the lowest score and lost the race for the best plan.
However, plan A and B had the same number of both works and advanced. The execution statistics of the candidate plans are not obvious why plan B won but the bonus points that I mentioned earlier in this post must have been used to solve the draw.
I’m rather surprised that both metrics works and advanced, which are largely responsible to score the plan, are identical. As for me, Index B is much better suited for the above-given reasons than index A for this type of query.

This is candidate A (using index A), executed on a shard without orphans:

"nReturned" : 101,
"executionTimeMillisEstimate" : 106,
"totalKeysExamined" : 101,
"totalDocsExamined" : 101,
"executionStages" : {
"stage" : "SHARDING_FILTER",
"nReturned" : 101,
"executionTimeMillisEstimate" : 106,
"works" : 101,
"advanced" : 101,
"needTime" : 0,
...
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"offerStatus" : {
"$eq" : 1
}
},
"nReturned" : 101,
"executionTimeMillisEstimate" : 105,
"works" : 101,
"advanced" : 101,
"needTime" : 0,
...
"docsExamined" : 101,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 101,
"executionTimeMillisEstimate" : 9,
"works" : 101,
"advanced" : 101,
"needTime" : 0,
...
"indexName" : "shopId_1_missingSince_1_merchantId_1",

Versus candidate B (using index B), executed on a shard without orphans:

"nReturned" : 101,
"executionTimeMillisEstimate" : 93,
"totalKeysExamined" : 101,
"totalDocsExamined" : 101,
"executionStages" : {
"stage" : "FETCH",
"nReturned" : 101,
"executionTimeMillisEstimate" : 93,
"works" : 101,
"advanced" : 101,
"needTime" : 0,
...
"docsExamined" : 101,
"inputStage" : {
"stage" : "SHARDING_FILTER",
"nReturned" : 101,
"executionTimeMillisEstimate" : 6,
"works" : 101,
"advanced" : 101,
"needTime" : 0,
...
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 101,
"executionTimeMillisEstimate" : 5,
"works" : 101,
"advanced" : 101,
"needTime" : 0,
...
"indexName" : "shopId_1_offerStatus_1_merchantId_1__id_1",

As you can see, they differ from the candidate plan of shard 3 (having orphans) only in not removing orphaned documents so that advanced has not been decremented.

As soon as shards 1 to 4 were cleaned from the orphaned documents, the query planner chose the right query plan B for all shards. From then on, the query execution time was better than before we added shard 5 to the cluster.

Conclusion

Even if you add a new shard to increase the performance of your MongoDB cluster, you may encounter slower queries because of orphaned documents that make the query planner choose not the best query plan.
Decent monitoring helps to detect and eventually solve the issue.

And that’s it! I hope you enjoyed reading and gained some new helpful insights!
If you found this article useful, give me some high fives 👏🏻 and share it with your friends so others can find it too. Follow me here on Medium (Kay Agahd) to stay up-to-date with my work. Thanks for reading!
Btw. idealo is hiring: Check out our vacancies here.

--

--