How moving to MongoDB Text Search solved a performance bottleneck in Shogun

Erol Fornoles
Shogun Team
Published in
5 min readSep 30, 2020

At Shogun, we utilize a mix of different technologies behind the scenes. One of the core technologies we use is MongoDB, a popular NoSQL database that helps power some of the cache objects we store from our supported platforms, such as products on a customer’s shop or product collections.

The Challenge

Recently, we’ve been encountering a severe bottleneck in our MongoDB cluster, experiencing ~600% CPU spikes from time to time.

Our production system relies on MongoDB Atlas, which means we had access to Atlas Query Profiler to discover the source of the spikes. If you’re not on Atlas, you can enable the MongoDB Database Profiler and use it to find bottlenecks.

Going through the Query Profiler, it became evident as to what was using a huge chunk of those CPU cycles: The profiler was littered with slow queries with execution times > 1k ms.

Here’s a sample query log that was particularly worrisome:

That’s 169 seconds for a query to finish, and it had to yield control to other queries more than 7,000 times before it completed. Furthermore, this was not an isolated case; it was happening repeatedly in bursts.

So, why were we getting 600% CPU spikes?

With this sample query log, it was easy to pinpoint the exact cause.

Some years ago, we introduced the ability to let our users perform text searches inside of the Shogun Dashboard and Editor. This provided a lot of convenience to the user, with the initial implementation using MongoDB’s $regex operator and code similar to this line in Ruby/Mongoid:

parent.items.or(title: regex).or(name: regex)

This was relatively easy to implement and has served our needs in the early stages. We have since grown year-over-year, and as we continued to onboard shops with large MongoDB collections, it has become apparent that this implementation can no longer serve the needs of our present stage and scale.

One thing that was problematic with the initial approach is the fact that case-insensitive regular expression queries generally cannot use an index effectively. Going back to the sample query log above, it seemed like we were using the index by virtue of performing an IXSCAN.

Was that really the case?

Based on the value of docsExamined, it sure looked like each document was being fetched so that the regular expression can be matched against each of the respective fields relevant to the search. In effect, we are using the index in some capacity, but the index alone is not sufficient to perform the query.

Documents had to be fetched and scanned from disk, which might not be very problematic for small sets of data. While they can be cached in-memory for successive calls, it doesn’t remove the fact that the regular expression needs to be matched one-by-one. This is CPU-bound, and hence, the spikes. Therefore, it does not scale as the data sets grow in size, and with an increasing number of concurrent users doing searches.

The Solution

We decided to move away from using $regex, adopting a more performant implementation until we can do a full search implementation by using something like ElasticSearch. After exploring our options, one promising solution we found was the use of MongoDB’s Text Search functionality.

The first step was to create a text index. This is an absolute requirement to perform text search queries on string content.

We had the option of indexing all string fields using a wildcard text index. Since we had specific fields we wanted to include in search, we opted for a compound index, with multiple text fields defined.

Here’s how we did it on our Mongoid document:

class Item
index({ parent_id: 1, title: 'text', name: 'text' }, ...)
end

Note that this has to be deployed first before we can proceed to the next step. It is important to ensure that the index has been fully created, since a text search query requires an existing text index and will throw an error if the index is not present. The collection was large in our case, so we had to wait a few hours for it to complete. If you’re thinking of trying this in your local environment, know that it should only take less than a minute to build the index on a collection with around a thousand documents.

The next step was to update our previous search code to use $text:

parent.items.where(‘$text’: { ‘$search’: query })

You might be wondering: How does MongoDB know which fields to search since we haven’t specified the fields in the search criteria?

MongoDB accomplishes this by performing an implicit text search on all fields with a text index on the given collection. That means it will perform a search on title and name for the code above.

The Result

By using a shop with a relatively large MongoDB collection, we benchmarked the revised search using a copy of production data, and it’s already shown a huge improvement:

A quick look at NewRelic also confirmed the new search had a huge impact in terms of response time. We took a snapshot three hours after it was released and compared it to metrics for the past day. The result was a staggering 10x improvement in average response times, not to mention the lack of response times spikes that were present in the old search:

As of this writing, we haven’t encountered any big spikes. Our users should also see the measurable improvement since their search now responds a lot faster than before:

Some Caveats

There are some constraints that need to be mentioned when using MongoDB Text Search:

  • A collection can only have one text index, at most. Every field with an index type of text will be used on a text search, which means you won’t have granular control over which set of fields will be included when you perform a text search on a collection.
  • It is not possible to specify index hints when performing a text search, so plan your collections/documents accordingly.
  • Only exact matches are supported at the moment, and there is no support for fuzzy/partial matches. For case-insensitive searches, matches are done on the complete stemmed word. This means that a search for “blue” would not match a field containing “blueberry” — but searching for “blueberry” or “blueberries” would match it.

Given those constraints, we felt they were acceptable for us in the short term. However, we will be exploring full text search engines, such as ElasticSearch or Atlas Search, to address these limitations over the long term.

--

--

Erol Fornoles
Shogun Team

Software Craftsman • Making the world a better place, one binary bit at a time.