Deploying Models from Dev to Production with MLflow and Bridge

Josh Broomberg
domino-research

--

You run a new model training experiment in a notebook. The 20 (or 2000) trained variants of your model are logged with their input parameters and train and test metrics.

Sorting by test metrics, it looks like the best variant from this training run is an improvement over the current production version. This model should probably replace the production version. You mark it as the Staging version of your model, wait a few seconds, and then query the updated staging endpoint to make sure it works as expected. Things look good.

You share a notebook demonstrating different, improved output from the staging API with the model’s stakeholders. When you’re ready, you mark the selected version as the Production version. A few seconds later, the new model version is in the wild and being consumed by downstream apps and end users.

This post provides step by step instructions for how to use MLflow and Bridge to implement the workflow above. If you already use MLflow, you’ll have hosted (and always up-to-date) APIs for every model in your registry in one command. If you’re new to MLflow, this guide will get you from 0 to 1 in <10 minutes.

…you’ll have hosted (and always up-to-date) APIs for every model in your MLflow registry in one command

We (the Domino R&D team) think this workflow is how DS/ML should be done. Models move from experimentation to consumption without any re-writes or costly co-ordination. We’re calling this pattern RegistryOps because the model registry is used as the source-of-truth for your model hosting/inference. Curious about how this compares to a CI/CD-based approach to deploying model? See the article we wrote comparing CI/CD and RegistryOps here.

TL;DR, give me the code

When you run the command below, Bridge will watch MLflow for new and updated models and make these models available as REST APIs for inference. To query any model in your registry, send a POST request to localhost:3000/<ModelName>/<Stage>/invocations using one of the formats in the MLflow docs.

docker run -it \
-p 3000:3000 \
-e BRIDGE_DEPLOY_KIND=localhost \
-e BRIDGE_MLFLOW_URI=YOUR_MLFLOW_HOSTNAME \
-e AWS_ACCESS_KEY_ID=YOUR_AWS_ACCESS_KEY_ID \
-e AWS_SECRET_ACCESS_KEY=YOUR_AWS_SECRET_ACCESS_KEY \
-e AWS_DEFAULT_REGION=${AWS_REGION} \
quay.io/domino/bridge:latest

The model endpoints are always up-to-date. When you push a new model version, Bridge will update the Latest endpoint so you can test out your work. If you tag a new version for Staging or Production, Bridge will update the respective endpoint. Welcome to RegistryOps 🎉.

Continuously Deployed Models in 4 Steps

If you don’t have MLflow, or have MLflow but don’t have any models that use the MLmodel spec to support MLflow serving, then this guide is for you. It will get you to a continuous deployed model in MLflow in 4 steps. The code for this guide is in this repo. To follow along, clone the repo and change into the repo root:

git clone https://github.com/dominodatalab/domino-research.git && cd domino-research

(1/4) Setup MLFlow

A complete MLflow installation consists of 3 components:

  • The MLflow tracking / model registry server
  • A Database backend — stores run and model metadata
  • A Storage backend — stores run and model artifacts

I’ve created a docker-compose specification that runs MLflow, MySQL (as the database backend), and Minio as an S3-compatible artifact backend. You'll need Docker and docker-compose installed on your machine.

cd guides/mlflow && docker-compose up -d 

10–15 seconds after running this command, MLflow will be available on localhost:5555, with a sample model in the MLflow registry. Feel free to adapt the docker-compose file for your own use on other hosting services. If you’re interested in a hosted and managed MLflow provided by my team — see here.

(2/4) Instrument your model training code

MLflow makes it incredibly easy to instrument your training code. First, in our training script, we configure the MLflow SDK to point at the MLflow server and storage backend from the previous step:

Next, we wrap our (ScikitLearn-based) training code in an MLflow run. Within this run, we log the training parameters and metrics as well as the trained model. Each run corresponds to the training of a singe model version. If the script trained multiple model versions — for example, in a hyperparameter optimization — they should be split into different runs inside script.

The last line of this script is very important — it logs the trained model as a pickle AND creates an appropriate MLmodel specification file. These files allow MLflow to serve an API for this model. MLflow has built in logging support for most of the common ML frameworks. If you want, you can also manually serialize your models and author a custom MLmodel file. This is beyond the scope of this guide.

The complete training script is in the repo here. As part of standing up the MLflow registry, I have configured this training script to run a few times automatically with different parameters. So you don’t have to run it yourself now in order to proceed with this guide.

If you want to add your own models to the MLflow registry follow the instructions here.

(3/4) Start Bridge in Localhost mode

Execute the command below to run Bridge.

docker run -it \
-p 3000:3000 \
--network mlflow \
-e BRIDGE_DEPLOY_KIND=localhost \
-e BRIDGE_MLFLOW_URI=http://mlflow:5555 \
-e MLFLOW_S3_ENDPOINT_URL=http://minio:9000 \
-e MLFLOW_S3_IGNORE_TLS=true \
-e AWS_ACCESS_KEY_ID=AKIAIfoobar \
-e AWS_SECRET_ACCESS_KEY=deadbeef \
-e AWS_DEFAULT_REGION=us-east-2 \
quay.io/domino/bridge:latest

Bridge will start and, within a few seconds, will deploy the latest, staging, and production model versions for all the models in the MLflow model registry. When each model is first deployed, MLflow will perform a Conda install to create an environment for the model. This will take about 60 seconds. Future deploys for each model will be faster.

Try this curl command to query the production version of the model we added in step 2:

curl \
localhost:3000/ScikitElasticnetWineModel/Production/invocations \
-H 'Content-Type: application/json' \
-d '{"data":[[0.1, 0.1, 0.5, 0.66, 2, 0.6, 0.17, 8, 1.1, 1.23, 11]]}'

(4/4) Update a model in the registry

  • Visit the MLflow model registry at localhost:5555/#/models and click into the ScikitElasticnetWineModel model. You should see 3 model versions.
  • Click into a version that isn’t in Production. Click the stage transition dropdown in the top right and select Production. Confirm the transition.
  • Within a few seconds, the new model version will be deployed to the Production API. Try the same curl command from the last step and observe the difference in the results.

At this point, you may be asking whether it should be this easy to change what model version is in production. The answer is yes and no. Production should be gated, but not by toil and days of friction, but rather an intentionally designed process. We built 🛂 Checkpoint to add a pull request experience to MLflow, allowing team members to compare and approve versions before they are promoted to Staging and Production (at which point Bridge deploys them automatically without further effort).

Where to now?

You can run Bridge where ever you like — on your laptop, on an EC2 Server, in a Kubernetes cluster etc. As long as it has access to your MLflow, it will generate and update a set of latest, staging and production API endpoints for all of the models in the MLflow registry.

Welcome to RegistryOps 🎉

--

--