MongoDB 4.2 wildcard indexes
There are lots of cool new features in MongoDB 4.2: Lucene integration, querying data in S3 buckets, the $merge aggregation step, embedded aggregation pipelines in updates, etc.
In this post, I want to focus on a minor feature that none-the-less stands to improve the performance of a pretty wide range of use cases: wildcard indexes.
A wildcard index is an index that is created on every attribute in a subdocument. So, for instance, let’s say I had some data that looks like this:
{
"_id" : 1,
"data" : {
"a" : 1728,
"b" : 6740,
"c" : 6481,
"d" : 2066,
"e" : 3173,
"f" : 1796,
"g" : 8112
}
}
Queries might be issued against any one of the attributes in the data
document. Furthermore, there may be new attributes added by the application that I can’t anticipate. To optimize performance, I need to create a separate index on each attribute:
db.manyIndexes.createIndex({"data.a":1});
db.manyIndexes.createIndex({"data.b":1});
db.manyIndexes.createIndex({"data.c":1});
db.manyIndexes.createIndex({"data.d":1});
db.manyIndexes.createIndex({"data.e":1});
db.manyIndexes.createIndex({"data.f":1});
db.manyIndexes.createIndex({"data.g":1});
But even this won’t work unless I know for sure what the attribute will be. Wildcard indexes to the rescue:
db.wildcardIndex.createIndex({"data.$**":1});
This statement creates an index on every attribute in the data document: even if new attributes are created by an application after I create the index.
That’s great! But obviously, there’s a cost. Let’s see how wildcard indexes perform for insert, find, update and delete statements when compared to:
- No indexes at all
- A single index on a single attribute
- Separate indexes on all the attributes.
Below we see the time taken to insert, update and delete 10,000 documents in each indexing scenario, and the time to retrieve 10% of the documents in each scenario.
First overhead on modifications:
Interestingly, I found the wildcard index overhead to be substantially higher than creating an index individually on each element — more than twice the elapsed time for individual indexes. Clearly, if you have completely predictable attributes you will want to create specific indexes on those attributes rather than a wildcard index.
More importantly:
Don’t create a wildcard index out of laziness. The overhead of wildcard indexes is high, and they should only be used when an alternative strategy is not available.
If some of the attributes are never searched, then the wildcard index will add an overhead that will not be worthwhile. As always, only create indexes that are necessary: all indexes impact on performance and the wildcard index even more so.
As expected, wildcard indexes accelerate find() operations just like any other indexing option:
Conclusion
Wildcard indexes are a really useful addition to your indexing arsenal. However, don’t use them just as a programming short-cut: they impose significant overhead on insert, update and delete performance and should only be used when the attributes to be indexed are unpredictable.
PS: The test script I used to generate these results can be found here: