Uber’s Michaelangelo — ML Platform
Michelangelo, Uber’s machine learning (ML) platform, supports the training and serving of thousands of models in production across Uber. It is designed to cover the end-to-end ML workflow and it currently supports classical machine learning, time series forecasting, and deep learning models that span a myriad of use cases ranging from generating marketplace forecasts, responding to customer support tickets, to calculating accurate estimated times of arrival (ETAs) and powering Uber’s One-Click Chat feature using natural language processing (NLP) models on the driver app.
Around 2015, Uber’s ML engineers noticed the hidden technical debt in machine learning systems which we clarified in our technical debt series, or the ML equivalent of ‘But it works on my machine…’ . Engineers at Uber could built a custom, one-off systems that integrated with ML models, but they added to the technical debt and were not scalable in a large engineering organization. In their own words,
There were no systems in place to build reliable, uniform, and reproducible pipelines for creating and managing training and prediction data at scale.
That’s why they built Michelangelo. It relies on Uber’s data lake of transactional and logged data. It supports both offline (batch) and online (streaming-real time) predictions. For offline predictions containerized Spark jobs generate batch predictions, whereas for online deployments the model is served in a prediction service cluster, which typically consists of hundreds of machines behind a load balancer, to which clients send individual or batched prediction requests as RPCs.
Metadata (data about models) relevant to model management (e.g. run-time statistics of the training, model configuration, lineage, distribution and relative importance of features, model evaluation rubrics, standard evaluation metrics, learned parameter values, and summary statistics) are all stored for each experiment.
Michelangelo can deploy multiple models in the same serving container, which allows for safe transitions from old to new model versions and side-by-side A/B testing of models.
The current upgraded platform uses Spark’s ML pipeline serialization but with an additional interface for online serving that adds a single-example (online) scoring method that is both lightweight and capable of handling tight SLAs, for instance, for fraud detection and prevention. It does so by bypassing the overhead of Spark SQL’s Catalyst optimizer. Spark’s ML pipelines have some limitations which can also be solved using Kafka Streaming.
The move towards native Spark serialization and deserialization enables flexibilities and cross-environment compatibilities on a pipeline stage level for model persistence.
Due to the evolution and other updates to the Michelangelo platform, Uber’s ML stack can now support newer use cases such as flexibly experiment and train models in Uber’s Data Science Workbench, a distributed Jupyter notebook environment that can be served in Michelangelo, as well as end-to-end deep learning using TFTransformers.
Subscribe to our Acing AI newsletter, if you are interested:
Subscribe to the Acing AI/Data Science Newsletter. It is FREE! Reducing the entropy in data science. Helping you with…
Interested in learning how to crack machine learning interviews?