Serving machine models real-time with MLeap

Published in

Expedia Group Technology

4 min readSep 22, 2018

…and how MLeap makes our data science and engineering teams happy and productive

logo is part of the MLeap project which is GPL 2.0 licensed

At Expedia® Partner Solutions, we support the hotel business of the world’s leading airlines, travel agencies and consumer and loyalty brands. We help them innovate in the travel industry through a versatile API at the core of our offering. Like our partners, we like to grow and innovate. One example is our work to leverage machine learning capabilities across hotel sorting, anomaly detection, hotel recommendation and cross-sell, image ranking and chatbots.

Machine learning tools proliferation

Data scientists use a continually expanding range of machine learning technologies — Scikit-learn, Spark ML, Tensorflow to name a few — to solve different types of problems. MLeap makes our data science team happy because it makes it simple to deploy a wide range of ML applications and frameworks using its common serialization format and execution engine. Data scientists can leverage the right tools for the job at hand without worrying about unnecessary delays caused by the need to build additional model scoring services.

MLeap makes our engineering team happy as it minimizes the effort to serve or score models within a production environment. To get the serving infrastructure right, we look at the maintainability, monitoring and scalability of a single model scoring service, removing the overhead of multiple model serving implementations.

With so many machine learning tools available, clients (our shopping or recommendations APIs etc.) would need to manage the complexity of integrating with the right model serving implementation based on the model being used. But MLeap hides the differences of the underlying model training implementations from clients behind a unified scoring API. This makes changing between various models, perhaps by means of an A/B test, straightforward.

Model representation to the rescue

Traditionally, the transition between model training (data science team) and model scoring (engineering team) runs the risk of causing bottlenecks and inefficiencies as each team works with its own toolsets and according to its own workflows. Our data scientists typically work with Python and notebooks, while our software engineers use Java or Scala most commonly.

MLeap makes our data science team happy because it eliminates any rewriting that needs to be done to serve models in production. It achieves this by using an intermediate representation of the trained machine learning models for serving. The output of the training stage is the predictive model/pipeline, stored as a bundle of Json or Protobuf files. This means that once a model has been built and trained, it is also ready to deploy in production.

MLeap makes our engineering team happy since building and training models becomes decoupled from using them to make predictions in a production environment. Without having to re-implement the trained model/pipeline within the scoring environment, our data scientists can now launch new models in production with minimal engineering involvement.

Real-time serving machine learning models

MLeap makes our engineering team happy given the low latencies we’ve been able to achieve in our scoring service, even while scaling to high throughputs. At EPS, we’ve built a REST scoring service using the MLeap library, the serialized MLeap bundles and Spring Boot. Taking our hotel sorting use case (where our machine learning pipelines are relatively elaborate consisting of 50+ stages) as an example, the results we’ve come across in production have been very encouraging: a 99th percentile latency of 30ms to 70ms under considerable load (averaging at approximately 500 requests/second) when scoring on average 250 hotels within each individual scoring request.

Feature engineering made easy

When building the model, data scientists convert the raw data into features that better represent the underlying problem to the predictive models. This helps improve model accuracy at prediction time.

MLeap makes our data science team happy because most gains come from good features and MLeap allows them to pull in as many features as needed and combine them in intuitive ways.

MLeap makes our engineering team happy given that implementing our custom feature transformers with the serialization code required is a matter of just a few classes worth of code. We can follow similar engineering practices as with any other code and write tests to make sure we’ve eliminated any discrepancies between how we handle data in the training and serving pipelines. Because debugging differences between online scoring and training in production can be hard.

With an increased confidence that the model in our training environment will give the same score as the model in our serving environment, MLeap makes both our data science and engineering teams happy.

Check out the MLeap documentation for more information or join the community on Gitter! By the way, I also contribute to the MLeap open source project. Please reach out if you’d like to get involved, we always welcome new contributions.

Serving machine models real-time with MLeap

Written by Anca Sarb