Expedia Group Technology — Platform

Powering ML Platform Orchestration and Experimentation

How the ML Platform is powering machine learning pipelines and accelerated experimentation.

Andrew Campen

Published in

Expedia Group Technology

7 min readFeb 13, 2024

Machine Learning has become an increasingly prevalent aspect of developing new software systems that deliver intelligent, personalized customer experiences. As the number of these experiences and systems grows, so does the number of machine learning models. For a single model to remain accurate, multiple versions need to be trained, deployed, and tested quickly. Historically, this has resulted in pressure on engineering teams to align around retraining and integration timelines for new models, and a longer cycle of online experimentation due to the coordination overhead between experience and machine learning engineering teams.

The journey and advantages of building a unified Machine Learning Platform at Expedia Group™️ (EG) were elaborated in a previous article. In this article, we discuss how EG’s ML Platform Orchestrator aims to address these problems by lowering the amount of engineering and integration effort for new ML models and accelerating Online ML experimentation.

Why add an orchestration layer?

ML-driven experiences have become a common pattern to develop. That could be a recommendation based on user context, a price forecast, or an image selected for display. In these cases, a request needs to be made to an ML model, and the result is displayed to the user. If more than one ML model is involved, that would mean multiple requests and will result in the need for complex logic to be written to coordinate the models. By adding an orchestration layer between any ML models and the ‘experience layer,’ the code or service responsible for providing interaction to the user logic can be simplified and removed. The ML Platform Orchestrator aims to provide that orchestration layer and encapsulate the needed interactions between the ‘experience layer’ and the ML Platform.

The orchestration layer is between the experience service or pages and the machine learning models. There can be several machine learning models connected by a single orchestration layer. — The orchestration layer sits between the experience services or pages and the machine learning models.

What is EG’s ML Platform Orchestrator?

ML Platform Orchestrator is a configuration-driven service that performs conditional request routing and pipelining across multiple ML models; it functions as a single endpoint for the experience services to integrate against. Orchestrator is also scalable: A new instance is created per client space and configured per the requirements of that use case. No code is needed to create this service (a configuration is defined) and a service container is deployed using the provided configuration.

A single entry point is then created for an ML experience through the orchestrator. This allows ML models to be transparently swapped out without changes to the client consuming the ML experience. Additionally, the presence of the orchestrator reduces the amount of integration effort since only a single integration is needed, between the client consuming the ML experience and the orchestrator, regardless of the number of ML models or the frequency at which they change. This also provides seamless interoperability with other ML Platform components like feature stores, monitoring, and logging tools.

Orchestrator microservice can be seen as a single box. Inside that service is a single configuration which is managed the same way as traditional software code. The orchestrator service connects with model containers via realtime API calls. Model containers can be deployed independantly from the orchestrator service. — ML Platform Orchestrator as a configuration-driven micro-service that interacts with downstream APIs, mainly ML model containers.

Orchestration configuration

The ML Platform Orchestrator configuration provides the ability to define ML pipelines for request processing. The resulting pipeline can be a simple model chain, where the output of one model is fed into the input of another, or more complex branching scenarios. Below is a visual representation of a sample configuration file. As pictured, a configuration begins with a single start node and then can contain additional nodes. These nodes can be of many different types, based on what has been defined as part of the configuration.

A node could be:

A model processor that invokes a specified ML model
Define an action step like an aggregation, merge, or conditional function.
Retrieve cached value lookup.
Output data to a stream or data bucket.

Execution proceeds through the configuration based on the defined dependencies. In the example below, the configured Processor1 is dependent on the start node, and Processor2 and Processor3 are dependent on Processor1.

Orchestrator configuration examples shows a single main pipeline that contains two processors being executed in parallel and the result merged. Then the pipeline is again executes 2 paths in parallel. The first path executes a condition via a sub-pipeline then executed a separate cache sub-pipeline. The other path simply executes a single processor. — Visual representation of an example MLP Orchestrator configuration pipeline.

This execution flow inherently forms a directed execution graph that can be followed any time a request is made. The configuration can also support conditional execution, as seen in the first sub-pipeline, based on a configured condition. If the value of a==1, processor-cond-1 be executed before proceeding to the cache node. The input and output fields between nodes can also be configured. This directed graph approach gives users of the orchestrator full freedom to define a wide range of pipelines to suit their workflow needs.

A modular and extensible framework

ML Platform Orchestrator is built using the Production Inferencing Library Suite (PILS) framework. This is a modular, extensible framework built by Expedia’s ML Platform team as Java libraries where different functional actions can be written as individual ‘plug-ins’ or processors. The individual processors are combined with the provided configuration to form a functional orchestrator service. A processor can be as simple as forming and sending a request to an individual model or more complex, such as hydrating a request using data provided by a feature store.

PILS provides a versioned Service Provider Interface (SPI) for new modules to implement. Any module that implements this interface can then be combined into an effective “prediction pipeline” and exposed either as a service, like in the case of an orchestrator, or imported into a Java service as a library. In the example below, all four processors used to create the configuration graph are implementing the same interface class. Each processor can implement a different action however they all share the same input and out put interface defined by the PILS framework allowing them processors to be combined interchangably.

A graph showing the configuration of 4 different processors (labeled Processor A-D). Each processor shows the same service interface (SPI) demonstrating the concept that all PILS processors using the same interface. — Example of a PILS Configuration graph where all processors are implementing the same interface.

In the case when the PILS modules are imported into an existing service, ML capabilities become effectively “embedded” directly into the service. PILs modules can also be combined with ML serving containers to create a middleware service that enhances the model serving functionality by project methods for performing tasks like data logging and feature hydration without the need to write code into the ML model.

By providing a well-defined mechanism for extensibility, the effort required to add new features and enhancements that can be leveraged by the ML Platform Orchestrator or other aspects of the ML Platform become much lower. This aspect of PILs design encourages contributions by engineers from outside of EG’s ML Platform team to quickly build processors to power a variety of different use cases. Processors can also be accepted into the core platform offering by completing a review of the code quality and evaluation of the platform nature of the processor.

Experimentation via ML orchestrator

Fast experimentation has become a key tenant of Machine Learning methodology. The ability to quickly launch a new experiment to determine if a new ML model delivers a better experience enables accelerated data-driven decision-making. Historically, at Expedia, experimentation was primarily driven via user bucketing via the website pages and took upwards of 6 months to launch a single test. These buckets would then need to be passed through the service stack to be able to be actioned on by ML models and require coordination of multiple code changes per test launch. The ML Platform Orchestrator is now integrated directly with the Expedia Experimentation Platform also know as Expedia Test and Learn(EG TnL). Configuring an entity ID provides a unique identifier for the object that the experiment will measure on, such as the trip or user. With an experiment ID, a request can be bucketed directly within the Orchestration layer to log each experiment exposure. Moving this integration closer to the ML models reduces the total engineering effort to less than 3 weeks by not requiring multiple changes across various service layers and reduces dependencies on multiple teams. It also puts control of the experimentation configuration and readout directly in the hands of the ML teams by reducing engineer engagement.

The orchestrator service is represented as a box containing the EG TnL SDK. That connects with the EG TnL services and exposure logging as an external service. Orchestrator then shows taking an entityId as input and determining which external model to send the request to based on the resolved bucket number. Bucket zero goes to Model A and bucket one goes to Model B. — Experimentation is enabled within the MLP Orchestrator via the EG TnL SDK.

Summary

In this overview, we have shared the motivations behind the ML Platform Orchestrator, its design, and some of the early successes with applying this approach to accelerate ML model experimentation. By offering an easy-to-use, extensible and low-code approach, EG’s ML Platform Orchestrator:

Reduces development time and accelerates the development and experimentation of different ML experiences.
Configuration-driven approach that powers the ML Platform Orchestrator enables the defining of a flexible directed graph that can fit the needs of even highly complex ML pipelines.
ML models can be transparently changed without multiple, costly code changes.
Integration with Expedia’s Experimentation platform enables experimentation directly without the need for upstream code changes.

Finally, we aim to continue expanding the capabilities offered by the Orchestrator through configurable aggregations and data transformation as well as model-response caching to improve scalability and performance. We also look forward to contributions to the PILs framework, which would further expand not just the capabilities of Orchestrator but also EG’s ML Platform.

Please be on the lookout for more articles about other exciting ML Platform components in the future.

Learn about life at Expedia Group