PriceAggregator: An Intelligent System for Hotel Price Fetching (Part2)

Zhang Jiangwei
Agoda Engineering & Design
5 min readMay 19, 2020
Figure 5. PriceAggregator system flow.

In the previous blog, we explained how Agoda manages inventories and what are the challenges. In this blog, we explain scientifically how and why it works. Figure 5 presents PriceAggregator system flow.

Agoda Booking Flow

Figure 6. Agoda booking flow.

Figure 6 presents the major steps in the hotel booking process. In stage 1, a user requests for a hotel price. In stage 2, if the hotel price is already existing in the cache, then the user will be presented with the cached price. Otherwise, the user won’t be able to see the hotel price. In stage 3, if the user is happy with the hotel price, then the user clicks booking. In stage 4, Agoda confirms with the hotel whether the price is eligible to sell. If the price is eligible to sell, then Agoda confirms the booking in stage 5.

Hence, the expected number of bookings is the following

  • P_D(s_i) is the probability of a user search s_i that hits on the hotel prices in the cache.
  • P_B(s_i) is the probability of a user search s_i that ended up with booking attempt, given that the hotel price is in the cache.
  • P_A(s_i) is the probability of the hotel price is accurate after a user makes a booking attempt on search s_i.

Hence, we want to maximize K.

SmartTTL

In this section, we explain how we build a smart TTL service which assigns itinerary request specific TTL to optimize the bookings. There are three major steps: price-duration extraction, price-duration clustering and TTL assignment.

Price-Duration Extraction

Price-duration refers to how long each price stays unchanged. This is approximated by the time difference between two consecutive requests of the same itinerary that Agoda sends to the supplier. Figure 7 presents an example of extracting price-duration distribution from empirical data of hotel Hilton Amsterdam and search criteria <2019–10–01,2019–10–02,1,0,1> (check-in, check-out, adults, children, rooms).

Figure 7. Price duration extraction from empirical data

Agoda first sends a request to supplier at 13:00 to fetch for a price, and that’s the first time we fetch a price for such itinerary. So, there is no price change and no price-duration extracted. Later, at 13:31, Agoda sends the second request to the supplier to fetch for price, and observes that the price has changed. Hence, the price-duration for the previous price is 31 minutes (the time difference between 13:00 and 13:31). Therefore, for each search criteria, we can extract the empirical price-duration distributions.

Price-Duration Clustering

In Agoda, we have Billions of such user searches every day. It is practically intractable and unnecessary to store such volume of search criteria’s TTL into in-memory cache, e.g. Redis or Memcached. Therefore, we need to reduce the cardinality of the user searches.

We cluster these user searches into clusters to reduce the cardinality. In PriceAggregator, we used XGBoost to rank the features for clustering, and the significant features are check-in and price availability. We observe that the itinerary requests with same checkin and price availability (whether the hotel is sold out or not) have a similar price-duration distribution. Hence, for all supplier requests with same checkin and price availability, we group them into the same cluster, and use the aggregated price-duration distribution to represent the cluster. By doing this, we dramatically reduce the cardinality to 1000, which can be easily stored into any in-memory data structure.

TTL Assignment

In the above section, we finished clustering. Next, we need to assign a TTL for each cluster. Note that, we want to optimize the bookings as expressed in Equation 1, and the TTL will affect the cache hit (P_D in Equation 1) and booking price (P_A in Equation 1) accuracy. Hence, we want to assign a TTL for each cluster in which Equation 1 is optimized.

For cache hit, we can easily approximate the cache miss ratio curve using Cumulative Distribution Function (CDF) of the gap time (time difference between current request and previous request for the same itinerary search). For example, 80% of the requests are having gap time 120 minutes. Hence, by setting TTL at 120 minutes, we can achieve 80% cache hit. Therefore, the cache miss ratio curve related to TTL can be easily found, and we can know the approximated cache hit rate for each TTL we choose.

For booking accuracy of a cluster C, this can be approximated by

For example, in a specific cluster, if the empirical price-duration observed is 120 minutes and 100 minutes, and we assigned 150 minutes. Hence, we know that there are 120 and 100 minutes that we are using the accurate price. Hence, the accuracy is 11/15.

Hence, to optimize the bookings as expressed in Equation 1, we just need to enumerate the different TTL in each cluster to find such TTL. So far, we have completed the major steps in SmartTTL.

FixedTTL vs. SmartTTL

In section, we will present the experiments conducted in 2019.

As Agoda is a publicly listed company, we are sorry that we can’t reveal the exact number of bookings due to data sensitivity, but we will try to be as informative as possible.

We compare the performance between FixedTTL (A) and SmartTTL (B). Figure 8 presents the results on Supplier, and we can see that B variant wins A variant by a small margin.

Figure 8. A/B Experiment on Supplier A

Overall, B variant wins by 2-4% for cache hit, and 2% for bookings. This is expected as SmartTTL only address Challenge 1.

In the next blog, we will explain how to address Challenge 2, Challenge 3 and Challenge 4.

Authors

Zhang Jiangwei, Li Zhang, Vigneshwaran Raveendran, Ziv Ben-Zuk, and Leonard Lu

Acknowledgement

Thanks Lin Jingru, Nikhil Fulzele and Akshesh Doshi for reviewing.

--

--