MongoDB 4.2 wildcard indexes

Guy Harrison
Guys Database Blog
Published in
3 min readJul 29, 2019

There are lots of cool new features in MongoDB 4.2: Lucene integration, querying data in S3 buckets, the $merge aggregation step, embedded aggregation pipelines in updates, etc.

In this post, I want to focus on a minor feature that none-the-less stands to improve the performance of a pretty wide range of use cases: wildcard indexes.

A wildcard index is an index that is created on every attribute in a subdocument. So, for instance, let’s say I had some data that looks like this:

{
"_id" : 1,
"data" : {
"a" : 1728,
"b" : 6740,
"c" : 6481,
"d" : 2066,
"e" : 3173,
"f" : 1796,
"g" : 8112
}
}

Queries might be issued against any one of the attributes in the data document. Furthermore, there may be new attributes added by the application that I can’t anticipate. To optimize performance, I need to create a separate index on each attribute:

db.manyIndexes.createIndex({"data.a":1});
db.manyIndexes.createIndex({"data.b":1});
db.manyIndexes.createIndex({"data.c":1});
db.manyIndexes.createIndex({"data.d":1});
db.manyIndexes.createIndex({"data.e":1});
db.manyIndexes.createIndex({"data.f":1});
db.manyIndexes.createIndex({"data.g":1});

But even this won’t work unless I know for sure what the attribute will be. Wildcard indexes to the rescue:

db.wildcardIndex.createIndex({"data.$**":1});

This statement creates an index on every attribute in the data document: even if new attributes are created by an application after I create the index.

That’s great! But obviously, there’s a cost. Let’s see how wildcard indexes perform for insert, find, update and delete statements when compared to:

  1. No indexes at all
  2. A single index on a single attribute
  3. Separate indexes on all the attributes.

Below we see the time taken to insert, update and delete 10,000 documents in each indexing scenario, and the time to retrieve 10% of the documents in each scenario.

First overhead on modifications:

Overhead of wildcard indexes compared with traditional indexes (time to process 100,000 documents)

Interestingly, I found the wildcard index overhead to be substantially higher than creating an index individually on each element — more than twice the elapsed time for individual indexes. Clearly, if you have completely predictable attributes you will want to create specific indexes on those attributes rather than a wildcard index.

More importantly:

Don’t create a wildcard index out of laziness. The overhead of wildcard indexes is high, and they should only be used when an alternative strategy is not available.

If some of the attributes are never searched, then the wildcard index will add an overhead that will not be worthwhile. As always, only create indexes that are necessary: all indexes impact on performance and the wildcard index even more so.

As expected, wildcard indexes accelerate find() operations just like any other indexing option:

Wildcard index find() improvements

Conclusion

Wildcard indexes are a really useful addition to your indexing arsenal. However, don’t use them just as a programming short-cut: they impose significant overhead on insert, update and delete performance and should only be used when the attributes to be indexed are unpredictable.

PS: The test script I used to generate these results can be found here:

--

--