Featureflow: Democratizing ML for Agoda
In today’s rapidly evolving travel industry, satisfying customers’ needs has become increasingly challenging. As a prominent player in the field, Agoda has committed to addressing these challenges with cutting-edge technology and innovation. At the heart of these efforts is Featureflow, a robust machine-learning pipeline designed to achieve the new S-curve of experimentation velocity. This article provides an in-depth exploration of how Agoda’s Featureflow is changing the game in the travel industry.
The Challenge
Before delving into Featureflow, it’s crucial to understand why velocity has become such a complex issue in the travel sector. The industry has undergone a remarkable transformation in recent years, driven by technological advancements and changing consumer preferences. Travelers today are more informed and discerning, with easy access to a wealth of information and options at their fingertips. We recognized the need to adapt to this dynamic and competitive market quickly.
The Quest for Velocity in Experimentation
One of Agoda’s core values is “Be a scientist, experiment and measure.” With this value at the core of every Agodan’s mindset, we run thousands of experiments every quarter.
Before the age of Featureflow, we faced several challenges in our feature discovery process. One significant challenge was the time-consuming nature of feature analysis, training, and validation. Data scientists could spend up to a week on these tasks for a single feature, severely limiting the pace at which new features could be onboarded.
Additionally, data scientists maintained their notebooks for feature processing, leading to inconsistencies in analysis and training approaches. The lack of continuity between feature development and deployment also created bottlenecks in the workflow.
Introducing Featureflow
In pursuing a more efficient and standardized approach, we introduced Featureflow, a machine-learning workflow to streamline the entire process from feature ideation to deployment and experimentation. Featureflow helps automate the work of data scientists, allowing them to focus their time on more value-added initiatives.
Key Benefits of Featureflow
1. Accessibility: Any user with a new feature idea can utilize Featureflow to test it. It requires only basic SQL knowledge, not a degree in Data Science.
2. Efficiency: Featureflow significantly accelerates development cycles by automating or semi-automating various steps. This efficiency enables anyone to iterate, deploy, and experiment with features more rapidly without a resource bottleneck.
3. Standardization: With Featureflow, models are developed and maintained consistently across different features, allowing for meaningful performance comparisons and prioritization.
4. Flexibility: Featureflow supports a wide range of features, providing the flexibility to explore various feature types as long as the data resides in Agoda’s data market.
5. Scalability: Featureflow scales seamlessly to accommodate a growing number of ideas and the need for data-intensive computation, not limited by local Notebook capacity.
6. Interoperability: Featureflow integrates with other components like data visualization and backend services.
7. Experimentation: Featureflow facilitates experimentation with the ability to live-test features in a sandbox environment and analyze results before full deployment.
How Featureflow Works
Featureflow comprises two main components: the administrative UI and the data pipeline.
Featureflow UI: This component provides a user-friendly interface within Agoda’s internal back-office web application. Users can create a feature set using SQL query. This feature set groups features with similar characteristics, such as star ratings, review scores, and hotel social media followers. The interface includes query validation checks and collects default values and feature entity types.
Data Pipeline: It begins with data preprocessing and labeling, creating a base set that is reused across different feature sets. The pipeline then gathers all feature sets from the SQL database. It employs a Smart Feature Selection algorithm to choose which features to process based on recent performance. Spark jobs, called Babyships, are spawned in the data processing cluster to process these features.
Each Babyship is responsible for a single feature set. It generates feature data by executing user-input SQL queries imputed as feature sets, joining it with historical data, and storing the results in a data lake.
The training step occurs in a separate Python container called Watermelon, chosen for its flexibility in model choices and integration with the Scikit-Learn library. A Babyship job launches a Watermelon container using an Oozie coordinator. Each Watermelon loads the processed data, fits a model, computes feature statistics and model performance, and submits the trained model to Agoda’s model storage.
Featureflow Dashboard
A performance visualization dashboard complements the business user experience, aiding in identifying promising features. This dashboard offers insights into feature correlations, performance breakdown by segment, and historical performance trends.
Featureflow Monitoring
In addition to the performance dashboard, a monitoring dashboard is in place to track the engineering side of Featureflow. It collects performance metrics and tags them with feature set and feature information. This data helps ensure Featureflow’s health and allows for timely identification of issues, both internal and external.
Deployment and Experimentation
Feature deployment and experimentation are seamless processes within Featureflow. Users can initiate experiments with just a few clicks. A trial run in a sandbox environment precedes full deployment, allowing one to assess performance and identify potential issues. Once everything is deemed satisfactory, the feature is promoted to the production environment. Agoda’s experiment platform supports viewing experiment results and analyses.
Future of Featureflow
While Featureflow already covers much of the machine-learning lifecycle, Agoda is considering additional enhancements. One area of focus is model degradation monitoring, which currently requires human intervention. We also plan to expand Featureflow’s capabilities to support multi-feature models, leveraging multiple features to achieve better predictions and improved performance.
Conclusion
We developed Featureflow to address the challenges of efficiency and scalability in experimentation. Over 10,000 features have been processed through our pipeline, resulting in hundreds of experiments. Featureflow saves time and resources and reduces the risk of human errors. It reduces the feature analysis period from a week to a day and seamlessly integrates with various aspects of Agoda.
Moreover, Featureflow democratized ML-based optimization, allowing anyone to propose and implement their ideas. Previously, only a small group of 2–3 feature contributors had this privilege. Now, over 50 individuals have submitted ideas and contributed to Agoda’s success as never before. As a result, we’ve increased our quarterly experiments from 6 up to 20, with a larger feature pool and a more robust feature screening process.
This reflects our commitment to continuous improvement and delivering value to our customers.