Query Understanding Engine in Traveloka Universal Search

Part 2: Goal and Implementation

Ismail Afiff
Traveloka Engineering Blog

--

Fig 1. Product, Subproduct, and Action Classifier, Named Entity Recognition, User’s Intent and Search results for query: “Apartment near Cp”

Editor’s note: This is the second in a series of three posts by Ismail Afiff, a Principal Software Engineer with the Universal Search team, on the Query Understanding Engine that powers Traveloka Universal Search, specifically on the feature goal and implementation. Read the first part: “Introduction on why query understanding is a game changer for Traveloka Search Engine”.

The Universal Search team is responsible for delivering solutions that allow users to fulfill their travel and lifetstyle needs through smart query understanding and recognition.

Query Understanding Engine’s Goal

Before going deeper, let’s discuss the expected outcome of the engine. As the name implies, the goal is to come up with the user’s intent based on user’s query and context. By understanding user’s intention, Traveloka will be able to provide meaningful search results and recommendations to the users.

Before heading to the implementation, we need to know the possible intents of Traveloka user’s search queries. Luckily, our product manager along with each Product’s representatives have compiled the list. User’s intent could come in a form of “Searching for Transportation Options from an Origin to a Destination”, Searching for Flights from an Origin to a Destination”, “Searching for Accommodations by Name with a certain AccommodationType in an Area”, “Searching for TravelokaEatsProduct by Name with a RestaurantType near a Place-of-Interest”, searching for Everything near Me” or as simple as “An Area, a Place-of-Interest, or a Hoteland many more.

You notice some words are boldfaced in the intents above. Actually, they each have a special meaning since they describe the properties of an intent. For example in “searching for Transportation” intent, it has properties such as Origin and Destination. Similarly with “Searching for Accommodations” intent, it has AccommodationName, AccommodationType and Area properties. Intent properties are mostly unique to a specific intent. For example, AccommodationType is not present in “searching for TravelokaEatsProduct”.

By understanding the user’s intent and its properties, Traveloka will comprehend the query better and thus be able to come up with better search and recommendation results over time.

Query Understanding Engine Implementation

For the MVP, we are taking two approaches to implement query understanding, the first one is via Named Entity Recognition lookup and the second is via Machine learned Product, Subproduct, and Action Classifier (PAX), each with their own strengths and weaknesses. The decision is taken since in the early phase of the project, the project is still a bit vague and the possibility for those algorithms to work is still uncertain.

Named Entity Recognition (NER) Lookup.

Fig 2. Query to intent transformation for query: “Voucher Starbucks Kuncit”

Named Entity Recognition is a service that recognizes entities present in a text (query). For example in the query “Voucher Starbucks Kuncit”, NER will recognize “Starbucks” as a Restaurant Name and “Kuncit” as a Landmark (Kuningan City Mall). Similarly for query “Japanese Restaurant [in] Senopati”, NER will recognize “Japanese” as a Cuisine Type,”Restaurant” as a RestaurantType and “Senopati” as an Area.

Afterwards, the recognized entities are transformed into possible user intents. Recall that intent has properties — mostly unique to a specific intent. For example, CuisineType is only present in a “Search for TravelokaEatsProduct” and is not present in “Search for Accommodation”. The uniqueness of intent properties makes property identification a powerful way to infer user’s intent.

Machine-Learned Product, Subproduct, and Action Classifier (PAX)

PAX is a service that tries to comprehend Product, Subproduct, and Action of a query. For example, query such as“Cafe in central park” is transformed into Product: Eats, Subproduct: “Eats Directory” and Action: “Search”. This is achieved via machine learning based on extensive manually annotated query samples.

Both NER and PAX have their own specific strengths and weaknesses, therefore, thorough comparison needs to be taken before deciding when to use which.

  1. Ability to obtain properties of an intent.
    In the query “Starbucks [in] kuncit”, it is not enough to classify only the intent, the product, subproduct, or action per se. For meaningful results and recommendations, the engine needs to comprehend that Starbucks is a RestaurantName and Kuncit is Kuningan City Mall (Place-of-Interest) to infer that the intent is “Searching for TravelokaEatsProduct by a Name (Starbucks) near a Place-of-Interest (Kuningan City Mall)”. NER is clearly the perfect tool for this job because of its ability to capture the intent properties.
  2. Ability to understand intent beyond Traveloka’s Products.
    As Traveloka goes beyond a simple Online Travel Agent and more into Enabling Discovery, user’s intent advances beyond a simple Traveloka Product. This could be in a form of “Searching for Transportation Options from an Origin to a Destination”, searching for Everything near Me”, “City guide of an Area”, or as simple as “An Area or a Place-of-Interest”.
    It is much easier to comprehend these intents via NER because the detected EntityType can be used to infer the intent beyond Traveloka Product.
  3. Ability to understand dynamic products.
    Several products such as Movie, come and go pretty quickly. NER lookup is superior in this regard since it will be immediately searchable after indexing.
  4. Ability to understand linguistic context.
    In this aspect, PAX tends to be superior from its NER lookup counterpart. PAX might be able to infer Product, Subproduct and Action from a less clear query context. NER lookup will simply throw away words that they don’t understand or in most cases refuse to propose an interpretation.
  5. Developer complexity.
    In this aspect, NER lookup is more complex to develop, since the burden of the algorithm is on the developer whereas machine-learned PAX relies on more annotated training data to enhance its inference.

As you might know, the development of NER lookup relies on a carefully crafted algorithm. Currently, it is an extension of single-entity-search that has been a staple of Traveloka Search Engine. I will explore NER lookup implementation in great detail in part three of the series. Stay tuned.

I am really fortunate to undertake this fascinating, open-ended search algorithm challenge with my team at Traveloka; one of the largest online travel companies in Southeast Asia. If you’re a software engineer interested in developing state-of-the-art search system to help millions of users to find their next adventures, have a search at the opportunities on Traveloka’s careers page!

--

--

Ismail Afiff
Traveloka Engineering Blog

Software Engineer, working on Information Retrieval. Passionate about Photography and Classical Guitar.