Measuring Autosuggest Quality

Posted on January 26, 2016 by Ben Torfs

Greetings from the Free Text Search squad.

We power most of the Skyscanner auto-suggest search boxes, such as the ones where you select an airport where you want to fly to, or a city in which you need to find a hotel. More generally though, you could say that our mission is to map user input to the user intention, using as few keystrokes as possible.

Autosuggest: speed, relevancy and the ‘zero-result rate’

These search results need to appear very fast (less than 200ms, preferably), but above all, they need to be relevant. This especially true in the mobile market, where character typing can be a bit of a hassle and screen real estate is too scarce to display long lists of results.

Our current service is working well, and we are proud of the speed and accuracy of our results (even when the user includes some challenging typos). As always though, there is room for improvement, particularly in markets using non-Latin scripts. Measuring the quality of our service is tremendously important in identifying areas of improvement as well as enabling better A/B testing in the future.

Today, the most important metric we use is the rate of queries returning no results at all (the ‘zero-result rate’). At first it seems like an overly simplistic metric, but it can actually be quite useful to compare the performance between different locales, and how they evolve over time.

For instance, let’s take a look at this measure for the past six months in the UK, our longest supported market, where we’ve spent a lot of time optimizing the site. Our results are very strong, yet there are still very small amounts of user-made typing errors that we cannot recover from — for example, a user may be searching for a location that doesn’t have an airport, or attempt to search for a flight to ‘Frankfart’ rather than ‘Frankfurt’ (always amusing).

Auto-suggest and non-Latin scripts

It’s not quite so easy when optimizing for newer markets or Skyscanner, where non-Latin script is used. There are some great tools out there that have really helped us make fantastic improvements; in Japan, we’ve used the wonderful Kuromoji library to convert these queries into the various Japanese character types. We’ve made similar enhancements for other languages such as Korean, which again has resulted in real progress.

Alternative auto-suggest KPIs

The zero-result rate provides us with a good idea of where to steer our efforts, but it is pretty coarse and we are looking for new and better KPIs. Here are some of the ideas we came up with:
• How many characters did the user have to type before s/he was able click on the result s/he was looking for? This metric has a direct relationship to the usability of the site. We could also count every backspace character as well, since those give an indication that we are not sufficiently resilient to typing errors.

•Whenever a result is selected, what was its position in the suggestion list? We should aim to have the clicked result to always be the first one. Today, the search ranking is already dependent on the selected market. For instance, a user who searches for ‘san’ in the USA will be returned results such as San Francisco and San Diego first. The same query typed in Spain however, will produce higher rankings for Santander and San Sebastián. Other improvements might include storing an individual’s search history and providing easier access to the queries that a user types most often.

• How many users started typing a query, but never actually selected a result (the ‘abandonment rate’)? In this case it is not only important to know how often that happens, but also why it happened. It might indicate that a street name was changed somewhere, and needs to be updated in our database.

Surely this list is not complete. Do you have thoughts on this, or other ideas on how to measure and improve our auto-suggest results? Please let us know in the comments, because we would love to hear them.

Learn with us

Please take a look at our current job roles available across our 10 global offices.

We’re hiring!
Like what you read? Give Skyscanner Engineering a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.