Ray & MLflow: Taking Distributed Machine Learning Applications to Production

Amog Kamsetty
Distributed Computing with Ray
5 min readJan 13, 2021

By Amog Kamsetty and Archit Kulkarni

In this blog post, we’re announcing two new integrations with Ray and MLflow: Ray Tune+MLflow Tracking and Ray Serve+MLflow Models, which together make it much easier to build machine learning (ML) models and take them to production.

These integrations are available in the latest Ray wheels. You can follow the instructions here to pip install the nightly version of Ray and take a look at the documentation to get started. They will also be in the next Ray release — version 1.2

Our goal is to leverage the strengths of the two projects: Ray’s distributed libraries for scaling training and serving and MLflow’s end-to-end model lifecycle management.

What problem are these tools solving?

Let’s first take a brief look at what these libraries can do before diving into the new integrations.

Ray Tune scales Hyperparameter Tuning

With ML models increasing in size and training times, running large-scale ML experiments on a single machine is no longer feasible. It’s now a necessity to distribute your experiment across many machines.

Ray Tune is a library for executing hyperparameter tuning experiments at any scale and can save you tens of hours in training time.

With Ray Tune you can:

  • Launch a multi-node hyperparameter sweep in <10 lines of code
  • Use any ML framework such as Pytorch, Tensorflow, MXNet, or Keras
  • Leverage state of the art hyperparameter optimization algorithms such as Population Based Training, HyperBand, or Asynchronous Successive Halving (ASHA).

Ray Serve scales Model Serving

After developing your machine learning model, you often need to deploy your model to actually serve prediction requests. However, ML models are often compute intensive and require scaling out to distributed systems in real deployments.

Ray Serve is an easy-to-use scalable model serving library that:

  • Simplifies model serving using GPUs across many machines so you can meet production uptime and performance requirements.
  • Works with any ML framework, such as Pytorch, Tensorflow, MXNet, or Keras.
  • Provides a programmatic configuration interface (no more YAML or JSON!).

MLflow tames end-to-end Model Lifecycle Management

The components of MLflow — taming end-to-end ML lifecycle management.
The components of MLflow — taming end-to-end ML lifecycle management.

Ray Tune and Ray Serve make it easy to distribute your ML development and deployment, but how do you manage this process? This is where MLflow comes in.

During experiment execution, you can leverage MLflow’s Tracking API to keep track of the hyperparameters, results, and model checkpoints of all your experiments, as well as easily visualize and share them with other team members. And when it comes to deployment, MLflow Models provides standardized packaging to support deployment in a variety of different environments.

Key Takeaways

Together, Ray Tune, Ray Serve, and MLflow remove the scaling and managing burden from ML Engineers, allowing them to focus on the main task– building ML models and algorithms.

Let’s see how we can leverage these libraries together.

Ray Tune + MLflow Tracking

Ray Tune integrates with MLflow Tracking API to easily record information from your distributed tuning run to an MLflow server.

There are two APIs for this integration: an MLflowLoggerCallback and an mlflow_mixin.

With the MLflowLoggerCallback, Ray Tune will automatically log the hyperparameter configuration, results, and model checkpoints from each run in your experiment to MLflow.

You can see below that Ray Tune runs many different training runs, each with a different hyperparameter configuration, all in parallel. These runs can all be seen on the MLflow UI, and on this UI, you can visualize any of your logged metrics. When the MLflow tracking server is remote, others can even access the results of your experiments and artifacts.

Ray Tune runs many different training runs, each with a different hyperparameter configuration, all in parallel.
Ray Tune executes many different training runs all in parallel as shown in the console output. These runs are logged to MLflow and can be seen from the MLflow UI as well. On the MLflow UI, each run has all the metrics, hyperparameters, and artifacts logged.

If you want to manage what information gets logged yourself rather than letting Ray Tune handle it for you, you can use the mlflow_mixin API.

Add a decorator to your training function to call any MLflow methods inside the function:

The mixin API also allows you to leverage MLflow automatic logging if you are using any supported framework such as XGBoost, Pytorch Lightning, Spark, Keras, or many more.

You can check out the documentation here for full runnable examples and more information.

Ray Serve + MLflow Models

MLflow models can be conveniently loaded as python functions, which means that they can be served easily using Ray Serve. The desired version of your model can be loaded from a model checkpoint or from the MLflow Model Registry by specifying its Model URI. Here’s how this looks:

While the above strategy lets you employ the full features of Ray Serve for scaling and training, the Ray Serve MLflow deployment plugin that implements the MLflow deployments API (including a command-line interface) will make integrating Ray Serve into your MLflow workflow even more seamless. Stay tuned for this feature.

Conclusion and Outlook

Using Ray with MLflow makes it much easier to build distributed ML applications and take them to production. Ray Tune+MLflow Tracking deliver much faster and more manageable development and experimentation, while Ray Serve+MLflow Models simplify deploying your models at scale.

What’s Next

Give this integration a try by pip install the latest Ray nightly wheels and pip install mlflow. Also, stay tuned for a future deployment plugin that further integrates Ray Serve and MLflow Models.

For now you can:

Credits

Thanks to the respective Ray and MLflow team members from Anyscale and Databricks: Richard Liaw, Kai Fricke, Eric Liang, Simon Mo, Edward Oakes, Michael Galarnyk, Jules Damji, Sid Murching and Ankit Mathur.

Additional Resources

For more information about Ray Tune, check out the following links:

For more information on Ray Serve, check out the following links:

--

--