PriceAggregator: An Intelligent System for Hotel Price Fetching (Part 3)

Zhang Jiangwei
Agoda Engineering & Design
6 min readMay 19, 2020
Figure 5. PriceAggregator system flow.

In the previous blog, we explained how Agoda manages inventories and what are the challenges. In this blog, we explain scientifically how and why it works. Figure 5. presents PriceAggregator system flow.

Moreover, we explained how to use SmartTTL to optimize the bookings in this blog. However, SmartTTL only addresses Challenge 1 as mentioned. In this section, we explain how to address Challenge 2, Challenge 3 and Challenge 4.

From Passive Model to Aggressive Model

For Challenge 2 and Challenge 3, we can resolve it by guaranteeing each data center sends a constant rate r_total of requests to the suppliers. Every time passive model sends r_passive requests to the suppliers, where r_passive< r_total, we proactively send extra r_total-r_passive requests to supplier. The question is, how to generate these r_total-r_passive requests? Next, we will present one alternative of generating such r_total-r_passive passive requests.

Aggressive Model with LRU Cache

In this section, we describe an aggressive model which aggressively sends requests to the supplier to fetch for hotel price. These requests are generated from the auxiliary cache C_LRU. There are two major steps:

  1. Cache building. The auxiliary cache C_LRU is built up by using historical user searches. For each user search s_i, they are always admitted into C_LRU. Once C_LRU reaches its maximum capacity specified, C_LRU will evict the user search s_i which is Least Recently Used (LRU).
  2. Request pulling. At every second t_i, passive model needs to send r_passive requests to supplier. And the supplier allows us to send r_total requests per second. Hence, aggressive model will send r_total-r_passive requests to the supplier. To generate such aggressive requests, Agoda pulls r_total-r_passive requests from C_LRU which are going to expire (starts from requests that are closets to expiry until aggressive is used up).

It is obvious that the above approach can solve Challenge 2 and Challenge 3. Moreover, it can also help improve the cache hit by requesting the hotel prices before a user searches for it.

However, this is not optimal. For example, a specific hotel could be very popular. However, if the hotel is not price competitive, then Agoda does not need to waste such QPS to pull the hotel price from such supplier. In the next section, we will introduce an aggressive model which optimizes the bookings.

Aggressive model with SmartScheduler

As mentioned, aggressive model with LRU cache is not optimal. Moreover, previously, passive model always has the highest priority. Meaning aggressive model only sends requests to supplier if there is extra QPS left. However, this is again not optimized. In this section, we present an aggressive model which optimizes the bookings. It has 5 major steps.

Itinerary frequency calculation.

This describes how many times an itinerary needs to be requested to ensure it is always available in database. Here, the itinerary is a tuple defined as <hotel_id, checkin, checkout, num_of_adults, num_of_children, num_of_rooms>. If we want a high cache hit rate, we want an itinerary r_i to be always available in the database, that means we need to make sure that such itinerary r_i is fetched before it expires. Moreover, for each r_i, we have the generated TTL_{r_i} . Hence, to make sure an itinerary r_i is always available in database D for 24 hours (1440 minutes), we need to send f_{r_i} requests to supplier, where f_{r_i} is

Itinerary value evaluation.

This evaluates the value of an itinerary by the probability of booking from this itinerary. With the above itinerary frequency calculation, we can assume an itinerary request is always a ‘hit’ in the database. Hence, in this step, we evaluate the itinerary value given that such itinerary is always available in our Price DB. That is, for all user search s_i on the same itinerary r_i, s_i in r_i, it will be always a cache hit, i.e. P_D(s_i) = 1. Recall from Equation 1, for each itinerary request r_i, we have now the expected number of bookings as

Request value evaluation.

This evaluates the value of a request by the probability of booking from this request. By Equation 3 and Equation 2, we can have the expected bookings per supplier request as

Top request generation.

This generates the top requests we want to select according to their values. Within a day, for a specific supplier, we are allowed to send M = 60*60*24*r requests to supplier. Therefore, by Equation 4, we can order the supplier requests and pick the most valuable M requests.

Top request scheduling.

This describes how to schedule to pull the top requests we selected. Given that we have M requests need to be sent to the supplier, we need to make sure

  1. each of these requests is sent to the supplier before its previous request expires.
  2. at every second, we send exactly r requests to the supplier.

SmartTTL vs. Aggressive Model with SmartScheduler

In section, we will present the experiments conducted in 2019.

As Agoda is a publicly listed company, we are sorry that we can’t reveal the exact number of bookings due to data sensitivity, but we will try to be as informative as possible.

Figure 9. A/B Experiment on Supplier C

We compare the performance between SmartTTL (A) and Aggressive Model with SmartScheduler (B). Figure 9 presents the results on Supplier C, and we can easily see that B variant wins A variant significantly in terms of booking and cache hit ratio. For cache hit and bookings, B variant wins A variant consistently.

Aggressive Model with LRU Cache vs. Aggressive Model with SmartScheduler

In this section, we compare the performance between aggressive model with LRU cache (A) and aggressive model with SmartScheduler (B). We present the A/B experiment results Supplier D. Figure 10 presents the results on Supplier D.

Figure 10. A/B Experiment on Supplier D

We easily see that that the B variant wins over A variant significantly in terms of booking and cache hit ratio. For cache hit, B variant wins over A variant consistently. For bookings, we can see that B consistently wins over A by more than 10%. And on certain days, e.g. day 5, B wins by more than 50%.

Conclusion

In these series of blogs, we presented PriceAggregator, an intelligent hotel price fetching system which optimizes the bookings. To the best of our knowledge, PriceAggregator is the first productionized system which addresses the 4 challenges mentioned. It differs from most existing OTA system by having SmartTTL which determines itinerary specific TTL. Moreover, instead of passively sending requests to suppliers, PriceAggregator aggressively fetches the most valuable hotel prices from suppliers which optimizes the bookings. Extensive online experiments show that PriceAggregator is not only effective in improving system metrics like cache hit, but also grows the company revenues significantly. We believe that PriceAggregator is a rewarding direction for the application of data science in OTAs.

Authors

Zhang Jiangwei, Li Zhang, Vigneshwaran Raveendran, Ziv Ben-Zuk, and Leonard Lu

Acknowledgement

Thanks Lin Jingru, Nikhil Fulzele and Akshesh Doshi for reviewing.

--

--