The Four Horsemen of Autocomplete

Weber Matias
Building Inventa
Published in
9 min readFeb 14, 2023

Autocomplete is not only meant to suggest the correct spelling of the queries. Is also a very powerful way of suggesting to our users the best-performing queries. Making users engage with best-performing queries translates into having a better search experience. Autocomplete is also used to retrieve queries that have meaningful and trustworthy results. So we want to leverage the prior knowledge of our queries’ performance to build our own query index.

There are many ways to build autocomplete functionality with OpenSearch. It has built-in tools that can be used for different use cases. But knowing which is best for your use case isn’t always clear. Each choice comes with its own tradeoffs.

The challengers

Completion suggester

According to the docs, completion suggesters are a data structure optimized for speed. They are costly to build and stored in memory.

Completion suggesters allow lookup using a prefix or a regex expression. Prefix only matches terms at the beginning of the string. While regex queries have more flexibility to look for matching terms within a phrase or word. But keep in mind that complex regex expressions may lead to high consumption of resources.

Prefix queries have built-in fuzziness for dealing with misspellings in the search bar. Completion suggesters also provide a weight parameter to sort the suggestions with some criteria. This is useful to rank suggestions.

Therefore, they are a very efficient choice when queries have a well-known order.

Context suggester

There are use cases where you would like to suggest products of only one category. One problem with completion suggesters is that they lack the ability to filter the results. Context suggesters are very similar to completion suggesters from a speed point of view, but provide the ability to filter and/or boost suggestions based on a category.

This type of suggester lets us provide the suggestion with a context mapping. There are two types of context mappings allowed: category and geo-location. According to the documentation, it’s mandatory to provide a context when indexing and when querying. So it’s important to consider how this can be implemented for your use case.

Adding the context to every suggestion also result in more memory consumption in comparison with the completion suggesters.

Edge N-gram tokenizer

The edge N-gram tokenizer, as its name implies, creates n-grams for every word in the field, starting with the first letter of each word. For example, “chocolate bar” is broken down into [c, ch, cho, choc, choco, chocol, chocola, chocolat, chocolate, b, ba, bar]. Edge n-grams perform better when the user queries don’t follow a clear order. As these n-grams are stored in the index, they can be queried directly with term matching. This also means that you can sort, use function scores, aggregations and all of the other operations that can be done within the queries. But remember that every new layer of complexity added will also add delay to the autocomplete system.

Because the field contains all of the n-grams from the field, it can match any term. From a search experience standpoint, this means that the user can type any word from the field and would be retrieved a suggestion containing that word. It’s not limited to the prefix. Speed is the most limiting factors of using a field mapping implemented with edge n-grams, because they are not optimized for it as completion and context suggesters are.

Search-as-you-type mapping

This mapping creates edge n-grams from shingles in the field mapping. The power of this type resides in a quick and ready-to-use autocomplete functionality.

It first creates shingles from the field, for example: “white chocolate bar with almonds” into [“white chocolate”, “chocolate bar”, “bar with”, “with almonds”]. These are shingles of size 2. It also creates shingles of size 3: [“white chocolate bar”, “chocolate bar with”, “bar with almonds”]. Then it creates the edge n-grams over these shingles.

The quality of the shingles created depends on the content of the field. For example, “white chocolate bar” seems like a good suggestion, but “chocolate bar with” doesn’t really do a good job of suggesting a quality result. That’s why it’s important to have more control over the suggestions that are surfaced to the users.

Challengers comparison

Selecting the best fit

Autocomplete is not only meant to suggest the correct spelling of the queries. Is also a very powerful way of suggesting to our users the best-performing queries. Making users engage with best-performing queries translates into having a better search experience. Autocomplete is also used to retrieve queries that have meaningful and trustworthy results. So we want to leverage the prior knowledge of our queries’ performance to build our own query index.

With this in mind, it’s easy to understand the importance of it developing it internally. As we need to control the shingles, we prefer not to use ‘search_as_you_type’ mapping.

We don’t have a use case for the Context suggester today. We are not planning to filter the suggestions based on a category or location.

So, how do we choose? Having in mind the importance of controlling the suggestions that we surface to the users, we can rule out the search_as_you_type. We can also rule out the context suggester by now because we don’t have the need of filtering the suggestions based on context in our product roadmap. That leaves our two main candidates: the completion suggester and the edge n-gram tokenizer.

The prefix completion suggester is optimized for speed. It usually works better in contexts where the order of the words is well-known. Today, around 80% of our user’s queries have 1 or 2 terms, and over 90% if we also consider 3 terms. But, having a mapping with the edge n-gram tokenizer allows us to have a better experience by searching in disordered words. The regex completion suggester can do this too, but it doesn’t provide fuzziness.

Should we choose the speed of the completion suggester or the flexibility of the edge n-gram?

Implementation

We are going to test the performance of our two main candidates: the completion suggester and the edge n-gram tokenizer. We are going to create an index containing our curated list of queries. Then, we will query it using the prefixes extracted from our user’s queries. We will index both types of suggesters and a ‘total_clicks’ field as a proxy for the popularity of the query.

Index mappings:

PUT query_index
{
"settings":{
"query": {
"default_field": ["query"]
},
"analysis": {
"analyzer": {
"autocomplete_edge_ngram": {
"tokenizer": "autocomplete",
"filter": ["lowercase"]
},
"autocomplete_edge_ngram_search": {
"tokenizer": "lowercase"
}
},
"tokenizer": {
"autocomplete": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 14,
"token_chars": [
"letter",
"digit"
]
}
}
}
},
"mappings" : {
"properties" : {
"query" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
},
"edge_ngram": {
"type": "text",
"analyzer": "autocomplete_edge_ngram",
"search_analyzer": "autocomplete_edge_ngram_search"
},
"length": {
"type": "token_count",
"analyzer": "standard"
}
}
},
"total_clicks" : {
"type" : "float"
},
"query_suggestion": {
"type": "completion"
}
}
}
}

To follow along and test the index, here’s a code block with example documents:

POST query_index/_doc
{
"query": "maquiagem",
"total_clicks": 10,
"query_suggestion": {
"input": "maquiagem",
"weight": 10
}
}

POST query_index/_doc
{
"query": "maquiagem natural",
"total_clicks": 9,
"query_suggestion": {
"input": "maquiagem natural",
"weight": 9
}
}

POST query_index/_doc
{
"query": "esponja de maquiagem",
"total_clicks": 8,
"query_suggestion": {
"input": "esponja de maquiagem",
"weight": 8
}
}

For the completion suggester we indexed the query as the input and the value of total clicks as the weight. We must use this weight field because we cannot query the ‘total_clicks’ field at query time.

Querying

For the completion suggester we can query using the prefix field for the suggestion. Notice that the fuzziness is declared at query time.

GET query_index/_search
{
"suggest": {
"query_suggestion": {
"prefix": "maq",
"completion": {
"field": "query_suggestion",
"skip_duplicates": true,
"size": 10,
"fuzzy" : {
"prefix_length": 1,
"min_length": 2
}
}
}
}
}

As expected, we are matching the prefix, so our hits are “maquiagem” and “maquiagem natural”.

As we enabled fuzziness we are getting the same results even if we misspelled the w in place of the q.

Querying the field indexed using the edge n-gram tokenizer is the same as issuing a normal query to our index, so we can use the match query as follows:

GET query_index/_search
{
"size": 10,
"query": {
"match": {
"query.edge_ngram": {
"query": "maq",
"operator": "and",
"fuzziness": 1
}
}
},
"sort": [
{
"total_clicks": {
"order": "desc"
}
}
]
}

We are getting more results now, as we are searching for the prefix within the query and not only in the beginning.

Fuzziness can be used for edge n-grams too:

Testing

Now that we have a feel of how the results look, let’s analyze their performance. To do this, we used a sample of our queries and created prefixes of various lengths. We created prefixes of up to 4 characters to recreate the most common scenarios. Then we logged the times that each suggester took to complete the task.

To log the time we used the ‘took’ value that’s returned in the response of OpenSearch, that is the time that took OpenSearch to process the query on its side.

Completion suggester mean time was 1.68ms versus the 4.20 ms of the edge n-gram tokenizer. That’s 2.5X! We were expecting this difference since the completion suggester is optimized for speed.

The completion suggester was much faster than the edge n-gram. But the difference is less than 3 ms (assuming that the network latency and frontend times are the same for both).

Edge n-gram suggester delivered a broader range of suggestions.

Conclusions

Finally, it all comes to the user experience. The suggestions should come up fast for a smooth user experience. If the delay added by using the edge n-gram tokenizer starts affecting the search experience, we should consider the use of the completion suggester.

Up to this point, we could test the difference that we get in a development environment. But this environment doesn’t have yet hundreds or thousands of queries being issued at the same time. So we need to further test how this difference actually evolves in production with the cluster’s load. At the same time, the index of queries for suggestions will get bigger and bigger with the new queries that our users are searching for and also empowered by the growth of our catalogue.

We are going to answer this question with experimentation. We are going to A/B test which of our candidate suggesters is the best fit for our user experience. We still don’t know how this difference evolves with cluster load once we put it in production. Does the improvement in performance result in a better experience? We will only find out after A/B testing…

Thanks to Pamela Peixinho and the search team for your help in writing this article!

--

--

Weber Matias
Building Inventa

I'm a passionate data scientist and marathon runner!