Autosuggestion services in web search

Published in

Data Science at Microsoft

11 min readFeb 8, 2022

--

Autosuggestion, also known as query autocompletion (QAC), predictive search, and type-ahead, is a key search-assist feature that has become indispensable in modern information retrieval for enhancing search experience. The goal of autosuggestion is to predict relevant query completions for incomplete prefix inputs so that users reach their search intent with fewer keystrokes, saving their time and effort.

Being a low-latency (matching the user’s typing speed to predict new suggestions at every keystroke), high-throughput (catering to millions of requests made per second on the search engine) service, robustness and adherence to a strict online latency budget are a must during the design of QAC services. Tries and inverted indexes are some favorites for choosing efficient data structures for the storage and lookup of suggestions.

Illustration of a basic autosuggestion framework (adapted from *[1]).*

A brief history of autosuggestion services

The core idea behind autosuggestion dates to the 1960s [1] with methods describing ways to reduce the number of keystrokes that developers needed to complete keywords and identifiers. Through the 1980s and ’90s, there was some additional research around developing similar assistive technologies. In 2000, Amazon filed a patent describing a technology to suggest query autocompletions during the query entry process, based on specific attributes of the database access system being searched. Later in 2004, Google Labs piloted its QAC service named Google Suggest, which was formally integrated into Google’s homepage in 2008. In 2009 Bing was released, having the autosuggest feature implemented on its homepage. In recent times, almost every search box on the web has some form of autosuggest mechanism implemented to aid the user.

Initially, the suggested completions were based purely on popularity (i.e., frequency in the query logs). As time passed, plenty of augmentations were introduced beyond simple prefix-based lookup to enhance the quality of suggestions. This involved the use of n-grams [2], intent identification [3], and personalization [4], among others, to cater to specific requirements of different users. Later, with advanced NLP and Deep Learning, the context of a prefix could be captured more effectively. This enabled QAC to extend and include purely “generative” suggestions [14], which improved the coverage as well as the relevance of autosuggestions by predicting unseen suggestions.

As it is usually the first point of contact for a user with information needs on a search engine, providing a great QAC experience is imperative. There have been several approaches developed to advance QAC services with better and more suggestions [5] while staying within the latency bounds. These broadly fall under the umbrellas of relevance and coverage.

Improving the relevance of autosuggestion services

Conceptually, an ideal autosuggestion service could predict the exact query that a user has in mind as the user starts typing with a single character (“psychic” QAC). Because that may not be possible, QAC services strive to provide users with the most relevant suggestions for every keystroke.

Early QAC services suggested query completions based on the popularity of queries available in their logs. However, this did not work well because users come to a search engine with differing intent. Today, such services go far beyond simply providing users with the most popular suggestions. Literature in this domain describes several methods used by QAC frameworks to address relevance. Some of the broad classes of such methods are personalization, diversity, and freshness.

Personalization

Personalization has become an increasingly popular concept in web search. There are several levels at which personalization is introduced in QAC to improve users’ experience by providing relevant suggestions based on their preferences.

Language-specific: Global search engines like Bing and Google allow users to search in a variety of languages. Thus, it makes more sense to show suggestions related to the user’s specified language [6] (or probable location where the language is dominant). As an example, for the prefix “a” it might be useful to show “aadhar card” (specific to India) as a suggestion when Hindi is selected as the language, but not when Korean is selected. Automatic transliteration may also be performed if necessary for languages with completely different scripts.

Based on the language preference of the user, search engines like Google provide slightly different query completions. The suggestions marked with red boxes indicate those that have special relevance to the language set by the user.

Time-sensitive: Several events (sports, political, festivals, and others) occur at more or less fixed intervals of time. This periodicity is leveraged by QAC services to anticipate temporal trends and suggest appropriate completions when prefixes are queried during the optimal time window [7].

The plot shows relative query popularity for “christmas” (blue) and “fifa world cup” (red) over time. While the query “christmas” is issued yearly, “fifa world cup” presents peaks roughly once in four years. These temporal trends can be featurized and infused into the ranker function to improve overall suggestion quality.

Based on user behavior and search history: Although widely debated, insights drawn from users’ session information can result in a significantly better search experience. As an example, for a user whose search history contains “avengers”, “captain america”, and “iron man”, it may be better to display “mark ruffalo” as the topmost suggestion for the prefix “mark ” instead of “mark zuckerberg”. Extensive research is being carried out in this area to come up with end-to-end personalized neural language models [8] to encode user representations through long/short-term search history and other signals. Query reformulation patterns by users have also been explored to improve personalization models [9]. In 2017, Google filed a patent describing a technique where search systems can improve query completions by considering information about the user [18].

Diversity

The aim of QAC is to assist users in reaching their search goals quickly. During the initial few keystrokes, the search intent may seem ambiguous, but as the user inputs more characters, the intent becomes clearer and suggestions become more precise. Thus, it is reasonable to avoid redundancy in suggestion intents typically only for shorter prefixes. The goal of QAC diversification is to return the correct query early in the suggestions list and reduce redundancy among QAC candidates [10].

For the short prefix “aa”, the suggestions in (a) cover only three broad intents (as shown by the colored boxes), while the suggestions in (b) cover five broad intents. Hence, (b) is more diverse as it does a much better job of seeking to overcome ambiguity about the user’s actual information needs.

Most literature on QAC diversification proposes greedy methods of deduplication [10], while a few discuss algorithms based on dynamic programming and A* search [11] as well. Some also attempt to model intents based on users’ interaction with documents displayed as previous search results [12] and use these to avoid seemingly redundant suggestions.

Freshness

Earlier, there was some discussion about time-sensitive personalization for periodic trends. Keeping these aside, several search queries gain rapid popularity in certain parts of the world because of some trigger (such as news, forecasts, sentiments becoming viral, and so on). Such recent trends also need to be picked up consistently and displayed by QAC pipelines to allow users to keep up with the latest information. For example, when the prices of cryptocurrencies rallied upward at the start of 2021, most search engines displayed “bitcoin”, “cryptocurrency” and “dogecoin” as top suggestions for prefixes “b”, “c” and “d”. As the trends started fading away, these suggestions started appearing in their usual positions as before.

Coverage improvement in QAC services

Different search engines show a different number of maximum suggestions for any prefix: for Google it is ten, while for Bing it is eight. However, there are situations (especially for longer prefixes, or prefixes with spelling errors) where suggestion blocks actually contain fewer suggestions. This is primarily due to the unavailability of relevant completions to be displayed and may negatively affect the search experience. Search engines employ several clever techniques to overcome these coverage issues. Some of them are as follows.

Scalable hosting of more suggestions

The most intuitive approach to improve coverage is to start hosting more suggestions in the trie powering the autosuggestion service. But this may cause memory outage as well as latency increase. Hence, effective ways of partitioning the trie of logged queries [5] have been proposed to host more suggestions in a scalable way.

Non-prefix matching

Looking up suggestions from the trie based on the user-entered prefix, followed by their ranking, forms the majority portion of a web-scale QAC service. However, certain coverage issues arise due to misspellings or malformed prefixes. In such situations, QAC services may be better off suggesting completions that do not exactly match the user’s prefix. These are non-prefix matches and are triggered using language-specific spell correction mechanisms or some inverted index maintained by the QAC pipeline.

Approaches based on language modeling

In the past, natural language models have been proposed that utilize the context of prefixes to generate suffixes for QAC [15]. Such models can exploit sub-word–level information as well [16]. Interesting deep learning techniques have also been explored that use language models to predict suggestions beyond those present in the trie [14]. Results from such techniques can also populate the suggestion box in low coverage cases.

Blocking and filtering

Search engines like Bing and Google have autosuggestions that are used across the globe and must adhere to certain policies as mandated by the various governments [19]. Thus, although blocking and filtering of certain content may lead to loss of coverage, it is a crucial aspect of any QAC pipeline. Suggestions may be blocked based on one or more of the following reasons (non-exhaustive list):

Hate/violence-related suggestions
Adult/offensive content–related suggestions
Piracy-related suggestions
Suggestions related to sensitive or controversial information
Gibberish or defective suggestions

Apart from the filtering done by the pipeline, search engines offer provisions to request the blocking of certain suggestions subject to scrutinization.

Interesting features in modern QAC services

With time, there have been interesting modifications to the UI/UX of autosuggestion services to enhance the productivity of users. I list some of my personal favorites below.

Rich entity information

Today, just as SERP goes beyond text and simple multimedia to show information in entity panes and carousels, among others, modern autosuggestions have also gone beyond simple textual suggestions. One often finds thumbnails and some auxiliary information about entity-type suggestions (personalities, places, organizations, books, movies, and so on) for easier disambiguation and quicker satisfaction of the users’ information needs.

Factual answers in suggestion box

Another interesting feature that search engines like Google have implemented is for suggestions demanding an answer in the form of a small calculation or lookup. Instead of ghosting, the actual answer is displayed prominently in the suggestion box itself. These answers currently cover a small range of topics and may be extended to cover more scenarios in the future.

Ghosting

While ranking suggestions are displayed in the block for a given prefix, autosuggest algorithms also calculate the likelihood of the first suggestion being clicked by the user. If this likelihood is greater than some threshold, the search box automatically gets pre-populated with this likely suggestion. This is known as ghosting. Implemented in Bing, Google, and elsewhere, such a feature can significantly increase the speed at which users can express their intent [17].

Some of the interesting features of modern QAC services implemented to increase user engagement; (a) and (b) are from Google, while (c) is from Bing.

Zero-input scenario

This is the case when a user has just clicked the search box but has not typed anything yet. Instead of showing no suggestions, autosuggestion services use different techniques to engage users.

Evaluation of QAC Techniques

We have looked at several approaches developed to allow autosuggestion services to satisfy users’ query intent. To assess the effectiveness of such techniques, several metrics have been devised and tested, some of which include the following.

Mean reciprocal rank (MRR)

MRR [13] is calculated as follows:

where Q is the set of all user-issued queries (q) in the evaluation set and rank_q is the rank that query q appears in the suggestion box. It is one of the most popular metrics to evaluate QAC methods and gives more credit if q is placed at the top of the suggestion block.

Success rate at top K (SR@K)

SR@K [13] computes the average ratio of queries that satisfy the user’s search intent and are located within the top K positions of the suggestion list. It is usually used when there is only one ground truth among all suggestions in the block.

Minimum keystroke length (MKS)

This user-assist metric [13] records the number of actions that a user must take before issuing a target query. If, for a prefix of length i, the target query appears in the block at position j, then the number of actions is calculated as (i + j). Now, the smallest value of (i + j) is chosen as the MKS for that query.

e-Saved

“Effort saved” [13] is another user-assist metric that calculates the expected ratio of characters that a user can skip inputting until the query is submitted. It can be calculated as follows:

where |q| is the length of the user-issued query q, i refers to the keystroke until which the query is typed, j refers to the position of the query in the suggestion box and P(Sᵢⱼ = 1) is the probability that the user clicks on the suggestion when it appears at the iᵗʰ keystroke in the jᵗʰ position.

This formulation prefers the improvements in the query suggestions for longer queries in particular.

Mean Recoverable Length (MRL)

When all characters of a target query (of length L) are known, it can be readily suggested. If we delete characters from right to left one-by-one, at some point (prefix length l), the target query will disappear from the list. Recoverable length [16] for the query is defined as L - l + 1. MRL is an average of the recoverable lengths for all queries in the evaluation set, and is more useful for systems that suggest one word at a time compared to full query completion.

α-nDCG

This is an extension to the normalized discounted cumulative gain (nDCG), which also takes into account aspect-specific rankings in the form of α [10]. This is used to evaluate the ranking while applying diversification approaches to QAC, but also requires manual judgments to assign intent-specific scores (in this context, intents become the aspects) to each of the suggestions.

In this article, I first reviewed the fundamentals of autosuggestion services implemented in search engines. Then, I delved into some of the key aspects of modern QAC services, such as relevance and coverage, by discussing various techniques that have been used to improve the UX of these services. I concluded by discussing some of the performance metrics to evaluate the effectiveness of autosuggestions.

I hope you found this article useful. Feel free to leave your feedback or comments in the section below.

Tezan Sahu is on LinkedIn.