Case Study : Uber’s Machine Learning Platform “Michelangelo”
Michelangelo is an internal ML-as-a-service platform at Uber that democratizes machine learning and makes scaling AI to meet the needs of business convenient. It is designed to perform end-to-end ML workflow: manage data, train, evaluate, and deploy models, make predictions and monitor predictions covering traditional ML models as well as deep learning. Michelangelo has been serving production use cases at Uber for a long time and has been deployed across several Uber data centers. In this article, we will explore why Michelangelo was developed in the first place and its architecture.
Before Michelangelo, Uber used open source tools available online such as scikit-learn, R and more but the impact of ML was limited to what a few data scientists and engineers could build with mostly open source tools. Hence, Uber was specifically looking to build reliable, uniform and reproducible pipelines for managing training and prediction data at scale. Most importantly, there was no established path to compare experiments and deploy a model into production. The engineering team at Uber had to create a custom serving container specific to the project at hand. Michelangelo was designed to address these problems by standardizing the workflows across teams though an end-to-end system that enabled engineers to easily build and operate machine learning systems at scale. Michelangelo primarily focused on speeding up the path from idea to first production model and the fast iterations that follow. Let’s understand the entire process in more detail through a use-case of UberEATS.
UberEATS has several machine learning models running on Michelangelo covering meal delivery time predictions, search rankings and restaurant rankings. The delivery time models predict how much time a meal will take to prepare and deliver before the order is issued. But, how does UberEATS work? When a customer places an order it is sent to the restaurant for processing. The restaurant then needs to acknowledge the order and prepare the meal which will take time depending on the complexity of the order and how busy the restaurant is. When the meal is close to being ready, an Uber delivery-partner is dispatched to pick up the meal. Then, the delivery-partner needs to get to the restaurant, get the food and drive to the customer’s location (which depends on route, traffic, and other factors), walk to the customer’s house to complete the delivery. The goal is to predict the total duration of this complex process, as well as recalculate these time-to-delivery predictions at every step of the process.
Data Scientists at UberEATS use gradient boosted decision tree regression models to predict this end-to-end delivery time. Features of the model are time of day, delivery location, average meal preparation time and so on. These models are deployed across Uber’s data centers to Michelangelo model serving containers and are invoked via network requests by the UberEATS microservices. These predictions are displayed to the customers as their meal is being prepared and delivered. Michelangelo is built using a mix of both open source systems and components built in-house inside Uber. Open source components include XGBoost, Tensorflow, Spark and more.
The data management components of Michelangelo are divided between online and offline pipelines. The offline pipelines are used to feed batch model training and batch prediction jobs while the online pipelines feed online, low latency prediction. Michelangelo also consist of a feature store that allows teams to share and discover different set of features for their machine learning problems. All of Uber’s data is first stored in a HDFS data lake that can be accessed by offline pipelines to compute features while models that are deployed online cannot access data stored in HDFS. So, the features for online models are precomputed and stored in Cassandra feature store where they can be read at low latency at prediction time.
As said earlier, Uber highly prioritizes on creating a centralized Feature Store in which teams at Uber can create and manage features and share with others because this makes it easy to add new features and once features are in the Feature Store, they are very easy to consume too. Michelangelo has something known as DSL (domain specific language) that modelers use to select, transform and combine the features that are sent to the model at training and prediction times. DSL is part of the model configuration itself and is applied at training time and at prediction time to guarantee that the same final set of features is generated and sent to the model in both cases.
Uber uses a distributed model training system that scales up to handle billions of samples and down to small datasets for quick iterations for algorithms such as decision trees, neural networks and linear models. The model configuration specifies the model type, hyper-parameters, data source reference and compute resource requirements which is used to configure the training jobs. After the model is trained, performance metrics are computed and combined into a model evaluation report. At the end of the training, the original configuration, learned parameters and the evaluation report are saved to their model repository for analysis and deployment. Michelangelo also supports hyperparameter search for all model types and all training jobs are managed through APIs and workflow tools.
Uber also trains hundreds of models before arriving at the ideal model for a given use case. These hundreds of models guide engineers towards the model configuration that results in the best model performance. Hence, keeping track of these trained models, evaluating them and comparing them is a matter of concern in Michelangelo. For every model that is trained in Michelangelo, it is stored as a versioned object in their model repository in Cassandra that contains information like who trained the model, model configuration, accuracy metrics, learned parameters etc. These information are easily available through a web UI or through an API for inspection and comparisons. Michelangelo also provides visualization tools to help understand why a model behaves as it does as well as to help debug it if necessary. The feature report shows each feature in order of importance to the model along with partial dependence plots and distribution histograms.
For deployment, Michelangelo has end-to-end support for managing model deployment via the UI or API. Offline models are deployed in a container that run in a Spark job either on demand or on a repeating schedule. Online models are deployed in a service cluster that predicts based on incoming requests. In both case, the model artifacts are packaged in a ZIP archive and copied to the relevant hosts across Uber’s data centers. The prediction containers automatically load the new models from disk and start handling prediction requests. Teams across Uber have automation scripts to schedule regular model retraining and deployment via Michelangelo’s API. Once models are deployed, they are used to make predictions based on feature loaded from a data pipeline or from the client service. In the case of online models, the prediction is returned to the client service over the network while in offline models, the predictions are written back to Hive(Data Warehouse) where they can be accessed directly through SQL-based query tools.
Also, more than one model can be deployed at the same time to a given serving container. This allows easy transitions from old models to new models and side-by-side A/B testing of models. At serving time, a model is identified by its ID or tag and prediction is made using the model most recently deployed to that tag. For scaling in the case of online models, more hosts are added to the prediction service cluster and the load balancer spreads the load. In the case of offline predictions, more Spark executors are added and Spark manage the parallelism. To monitor predictions, Michelangelo automatically logs and optionally hold back a percentage of the predictions that it makes and then later compares those predictions to the labels generated by the data pipeline. In the future, Uber plans to strengthen their existing system by adding tools and services such as AutoML, distributed deep learning and more.
References:
1. Uber Engineering Blog