Algorithm Behind Search in Tiket.com

Part 1 Autocomplete

Elisafina Siswanto
tiket.com
4 min readNov 30, 2021

--

by Elisafina Siswanto, Data Scientist @ Tiket.com

Whenever we want to travel, the first thing we need is a destination, where we want to go.

As an OTA, “Destination” is also the first door to every travel product on Tiket.com. So to book a hotel, for example, first, the user will need to fill in the destination, check-in and check-out date, and also the number of rooms and guests.

Since the destination is essential, we want to make sure we deliver the best experience possible for the user when they type their destination, including:

  • How to produce the most relevant item in the destination autocomplete?
  • And how can we also reduce the number of typing?

The answer is, of course, by using Machine Learning, especially using the Learning to Rank (LTR) technique. We will explain how we can create a machine learning model that can increase autocomplete ranking quality results by +-4% to 7.5%, depending on query length, compared to without using a machine learning model. We will separate the process into two phases, namely the experiment phase and the implementation phase.

Experiment Phase

The experiment phase is more focused on how we build our LTR model.

Our data source is historical user autocomplete search event detail which includes: user account, date and time of the event, what user type, list of items that user see in the autocomplete suggestion box and lastly, which item that the user chose.

Next step, from our data source, we need to determine how we will group our output and give ranking; in our case, we set the group as what user type (query). Then we will create our relevance score, which is calculated based on how often an item is chosen among other items within the same query. Here are some of the examples:

Features that we used in this experiment can be differentiated into three types of features:

  • Query Features: extracted from query text, for example, query length and number of the query word
  • Item Features: extracted from item detail, mainly related to how popular the item is within a specific timeframe, type of the item, whether it is the hotel, location, or point of interest (POI)
  • Similarity Features: to capture the similarity between the query and the item. For these features, we mainly use ElasticSearch functions like `match`.

The following steps are the usual modelling experiment, splitting data, feature selection, and hyperparameter tuning. We use XGBoostRanker as our machine learning algorithm and NDCG as our primary evaluation metric. Our best model managed to get 94% on NDCG@3.

Implementation Phase

Our autocomplete hotel system is built on ElasticSearch. Our ElasticSearch document contains the information of millions of items, including all hotels, locations, and POI. We update our item document daily to make sure we have the latest update of popular item features.

For model implementation, we run our XGBoostRanker model using the LTR plugin that is available on ElasticSearch. It is impossible to use and predict millions of items using the XGBoostRanker model since it causes a considerable response time which we can’t afford.

As can be seen on the flowchart diagram above, to make sure our response time is still within SLA, we split the process into two-part:

  1. Rule base filtering

In this part, we use several rules to cut down the number of items that are relevant to user input query significantly. For example, if a user searches for `ba`, we will only get the top 50 items that are most popular and most similar to `ba` using similarity match rules.

2. Model prediction

Then, we rescore the top 50 results from the 1st step using LTR plugin model prediction and get the final top 10 items to be shown to users.

With this XGBoost ranker implementation, we are able to improve our ranking result compared with only using rule-based filters. The number of typing before the user finally chooses an item also decreased from three times typing to only two times typing.

Now, what next?

Of course, we are still working on improving our autocomplete model by adding more complex features and also making the model more personalized for each user.

REFERENCES:

[1] https://en.wikipedia.org/wiki/Learning_to_rank

[2] https://elasticsearch-learning-to-rank.readthedocs.io/en/latest/

[3] https://en.wikipedia.org/wiki/Discounted_cumulative_gain

--

--