How Agoda identifies 5 million business opportunities globally in less than 48 hours

Introduction

Opportunities at Agoda involve activities, actions, or tasks that help to increase bookings and enhance the company's competitive edge. Agoda leverages a platform with multiple Spark pipelines called the SEP (Supply Equity Points) platform to identify and quantify millions of opportunities daily.

Image 1: Agoda opportunity management flow

Our big data platform can process data science models on billions of records using Apache Spark and detect opportunities at various levels.

Once the opportunities have been recognized, and their worth has been determined, they are efficiently organized and kept in a central repository called the Engage Opportunity Store. This store maintains the lifecycle of all opportunities and syncs them to the CRM (customer relationship management tool) so that sales agents can detect them and take the necessary actions.

With all these features, Agoda can stay ahead of the competition and ensure no opportunity goes unnoticed.

SEP: Spark Jobs and Quality assurance layers

Image 2: SEP Platform

Several billions of records of data from more than 100 data sources are consumed daily by a number of spark jobs, called theSEP platform," as mentioned above. We have allocated just 50 executors with 2 GB of memory per job and have well-optimized the system to run all tasks parallelly. SEP detects approximately 20 million opportunities in each run, as shown below.

Image 3: ~20 million opportunities

Data quality is of utmost importance in data-driven operations. To ensure the highest quality of data, we have established the following four quality pillars in our platform:

  • Functional tests (compile time).
  • Data quality validation (run time).
  • Functional anomaly detection in feature migration (release time).
  • Data anomaly detection on output data (run time).
Image 4: Quality assurance layers of SEP Platform

Functional Test (Compile Time)

Functional testing is the foundation of the quality of any system and includes both unit and integration tests.

Data Quality Validation (Run Time)

At this layer, we use features to detect incorrect data in input and output records at runtime. These records are then marked invalid and removed from the downstream pipelines, a process called data profiling and quarantine.

  • Data profiling & Data Validation

We maintain data accuracy and integrity through data profiling tools and rules to check data sources for accuracy and data validations to detect anomalies. We use these data profiling tools to set potential value ranges for each column, while data validations are used to identify outliers.

  • Data Quarantining

When we find an incorrect record, we mark it as invalid, stop it from being used further, and send out notifications so we can look into the issue if needed. We refer to this approach as “data validation” and “data quarantining.”

Image 5: Quarantining ~3% of invalid input data in each SEP run

Functional Anomaly Detection in Feature Migration (Release Time)

Opportunities are detected in Spark jobs based on data science models, complex business rules, and configurations. We continuously optimize and make changes to gain better results. Whenever a merge request is created in GitLab, automated computations and testing jobs are launched for the main and feature branches. The results from both branches are then compared automatically and published on the GitLab merge request page. These SQL-like comparison scripts are called "Shadow Test Scripts."

The Shadow Testing Scripts, which allow us to compare configurations, have been added to the repository. Thanks to this feature, any unexpected value changes can be spotted, and the corresponding merge requests can be confidently rejected.

Data Anomaly detection on Output Data (Run Time)

Implementing sanity test queries in simple scripts (i.e., SQL or spark scripts) is important to ensure reliability and confidence in the data produced. These queries are designed to detect errors and anomalies when aggregated values are considered, such as the percentage of opportunity count increase/decrease, an unexpected number of quarantined data, etc. Once the queries have been successfully plugged in and run, the downstream process may proceed with the final output, provided no errors have been detected. All users should adhere to the specified requirements when plugging in the queries.

  • Each script returns only one record.
  • Following data should be available as select columns.
  • Executed time (timestamp, partition column)
  • Test case name
  • Test level (Info, Warning, Error)
  • Is Test Passed (true or false)
  • Source table partition id
  • Number of records (optional: if specified, its trend will be monitored in Grafana dashboards).

Thanks to its quality control measures, SEP can detect up to 20 million opportunities in four hours. Based on certain business rules and schedules, a quarter of these opportunities (5.5 million) are forwarded to the Engage system for storage and management.

Engage System: The Opportunity Store

Image 6: Engage opportunity store

Engage: The Opportunity Store system was developed as the authoritative hub for all Agoda's supply opportunities and alerts. The SEP and other external systems can asynchronously send opportunities as alerts through Kafka.

Alert Center

The alert center service listens to alert signals and converts them to opportunity entities. Next, the Alert Center passes these entities to Engage API to store and manage. As shown below, Engage consumes approximately 5.5 million opportunities and stores them in Engage opportunity store.

Image 7: Creating ~5 million opportunities in Engage

Engage API & Lifecycle service communication.

In Engage system, there are dozens of microservices. Each service handles different lifecycle steps and/or user operations on opportunities. It also features opportunity validation, which evaluates the impact when an opportunity is taken or solved.

In a microservices architecture, service dependencies and communication strategies are key for scalability and customizability. In the case of an Engage system, GraphQL HTTP requests deliver up-to-date information, while costly updates and lifecycle triggers are managed asynchronously through Kafka. This one-way service dependency and communication approach enables us to fine-tune the system for optimal resource utilization and throttling.

CRM Gateway Service

The CRM Gateway Service synchronizes any changes to opportunities between Engage and CRM in close to real-time. Engage is an advanced and flexible platform that gives Agoda a competitive edge by collecting opportunities from diverse sources and delivering them to sales agents on time.

Image 8: Syncing ~5 million opportunities to CRM

In parallel to the opportunity creation in Engage (as shown in Image 7), the CRM gateway synchronization pipeline can quickly and reliably sync up to 5.5 million opportunities with the CRM in just 24 hours (as shown in Image 8). Additionally, this gateway service monitors the CRM for noteworthy changes and automatically updates opportunities stored in Engage as well. This ensures that all opportunity-update events and synced between Engage and the CRM parallelly, meaning it always has the most up-to-date data.

Engage — CRM auto reconciliation system

Image 9: Engage — CRM auto reconciliation pattern.

Engage has an auto-healing/reconciliation job designed to ensure data integrity. It periodically checks for anomalies and synchronizes them with the CRM tool. This helps detect and respond to any issues related to the synchronization process, such as infrastructure, network, or service failures. Using this dynamic auto-healing system, the data and operations remain reliable.

Conclusion

The Agoda opportunity detection system relies on the SEP system as its foundation, ingesting billions of records from over 100 sources. This system can detect up to 20 million opportunities, each with an associated monetary value, in less than 4 hours. Data profiling, validation, quarantine, and automated sanity testing are all utilized to guarantee the quality and accuracy of the result data.

The opportunities identified are then added to the Agoda Engage system, which acts as an opportunity store. This ensures the opportunities are synchronized with HubSpot within 24 hours, allowing agents to take advantage of them quickly.

In summary, by leveraging the power of the SEP and Engage systems, Agoda has created a powerful, efficient, and reliable opportunity detection system that can detect more than 5 million opportunities at any given time, store them, and send them to a CRM within 48 hours.

--

--