High Quality Autocomplete Search Part 2.

Scaling autocomplete search service to millions of users.

Ismail Afiff
Aug 1, 2018 · 6 min read
User experience as a function of search latency.

This is the second part of articles on how to create high quality autocomplete search. The first part is all about relevancy algorithm, how Traveloka overcome autocomplete search challenges in providing relevant results to users using the least amount of keystroke.

In this second part, We will have a look into another important metrics of autocomplete search — latency. No matter how good an autocomplete is, the user experience won’t be smooth if it takes too long to fetch results. In fact, 100ms second is the response time limit if you want users to feel like their actions are directly causing something to happen on the screen.

Search-as-you-type in particular is challenging to scale since the amount of request is really massive because every character being typed means another query to the server.

Fortunately, there are several tips to improve performance of autocomplete search.

1st, Moving the computation intensive part from query time to indexing time.

It is essential for the query time to be fast and swift, whereas indexing time doesn’t have an impact to user since it happens behind the scene.

Indexing vs Query time.

Take synonym analyzer process for an example. In text search, we expand every tokens with their synonyms, such as big is expanded into large, huge, humongous, extensive, etc, so that it can also be searchable by its synonym.

Synonym of Big

There are two options to implement synonym, during indexing or query.

If it is applied during indexing, tokens in the index will be supplemented with their synonyms. For example, a document “Spacious Hotel in Jakarta” becomes “Huge, Large, humongous (..etc) hotel in Jakarta” in the index. The apparent effect is the index size swells.

Whereas if synonym is implemented at query time, index size will still be the same, but the amount of query increases since we will also query the synonym afterwards. For example if the query is “Spacious Hotel In Jakarta”, we will also query “Big Hotel in Jakarta” “Massive Hotel in Jakarta”, etc. Hence, the noticable effect is an increase in search latency.

Unfortunately, choosing between implementing synonym at index time or query time is not that simple. Each of them have their own pros and cons.

Synonym during indexing

  • (+) Heavy at indexing, light at querying. Less latency, users are happier 😉
  • (-) Index size swells. A lot more tokens (original plus it’s synonym) to store. Need more memory.
  • (-) Synonym modification requires reindexing of all documents, since modifying synonym means modifying the text-analyzer at index time.

Synonym during query time

  • (+) Index size is small. No additional token to store. Require less memory.
  • (+) Synonym modification can be done at any time, because it only modifies the query, keeping the index untouched.
  • (-) Light at indexing, heavy at querying. Users need to wait longer😭.

In Traveloka, we believe user experience is crucial, therefore, heavy computation at query time should be avoided at all cost. Hence, We move various heavy token-expansion operations such as

  • Synonym
  • Shingle (space removal and token combination, remember “Hong Kong” vs “HongKong”)
  • EdgeNgram (abc -> a, ab, abc)

to indexing time, even at the expense of extra computational resource.

Furthermore, since the heavy-lifting token-expansion is being carried out at indexing time, we are able to use a simple match query. Match query works simply by checking whether a token exists in an index or not, making it blazingly fast. At best, this query requires only O(1) complexity, making query computation as light and as fast as possible.

2nd, Modelling document for high performance.

One of the easy way to improve performance is limiting the amount of searchable fields per document and combining equally treated fields.

Usually there are several fields in a document such as name, additional info, alias, etc. One of the purpose of this is to simply differentiate boosts between fields, e.g matching with name fields is more relevant than matching with alias fields.

Name, additional info, alias and type fields in a document

However, if there is no boost difference applied (those fields are treated equally), it is much better to combine those into a single field. This is because the search complexity increases linearly with the amount of searchable fields, whereas the number of tokens in a field doesn’t really affect query performance that much (especially if a simple match query is used)

In fact, we learn this trick by accident. In the previous implementation of autocomplete document, there are 8 different fields for name, one for every locales. When the name fields are combined into single name field, Voila the query latency immediately improved by 7x.

What’s more, complex field relationship such nested parent-child and fields that require joins tends to slow down the query. By keeping the field relationship simple and avoiding joins, the query performance will be higher.

3rd. Utilizing shards and replica

If the number of query is huge, a single node might not be sufficient to handle all of the loads. One of the simple thing is to scale using sharding and replication.

Replication means duplicating data to another node, whereas sharding means partitioning the data into multiple node. By adding replica, the read throughput is increased since replicas also serve read requests, whereas sharding is useful when the index size is too big for a single node to handle. Furthermore by adding replica, the reliability is improved by providing failover mechanism.

Sharding and Replication in ElasticSearch

Just a quick note with sharding, on several search engines library such as ElasticSearch and Solr, by default, for performance reasons, some properties such as inverse document frequency Idf is calculated per shard, not across all documents in the index, causing issues when document is heavily skewed and not well distributed across shards. The issue can be mitigated by forcing to calculate global properties, which will add a round-trip latency. For further info on this topic please have a look at the reference for each library ElasticSearch and Solr.

4th, Choosing hardware that ease performance bottlenecks.

AWS EC2 instance types.

Search algorithm tends to be I/O bound, therefore choosing hardware that has a better I/O performance will also improve search performance. The improvement can be in a form of higher memory performance (faster RAM) or faster disk (SSD vs spinning disk). Additionally, it is better to opt for local instead of external drive such as AWS EBS since internal drive will provide less latency. Although beware the data in local drive data will be lost in case of instance stops, terminations, or hardware failure — making scaling a bit tricky.

Those are the four approaches to improve the performance of autocomplete search that I have thought of and implemented when building this feature for Traveloka. Each approach increases the engineering complexity and the resources required. Nevertheless, for a better User’s experience, the benefits are usually worth the extra effort and resources.

I am lucky to face this fascinating, open-ended search algorithm challenge with my team at Traveloka, one of the largest online travel companies in Southeast Asia. If you’re a software engineer interested in developing state of the art search system, helping millions users to find their next adventures, have a look at the opportunities on Traveloka’s careers page!

Traveloka Engineering

Empowering Discovery Through Technology