MLflow and PyTorch — Where Cutting Edge AI meets MLOps

Published in
6 min readNov 12, 2020


Authors: Geeta Chauhan, PyTorch Partner Engineering Lead and Joe Spisak, PyTorch Product Lead at Facebook

PyTorch has continued to evolve rapidly since the introduction of PyTorch 1.0, which brought an accelerated workflow from research to production. Looking at the momentum in research, as shown on, we can see that the research community has embraced PyTorch as its tool of choice. Conversely, the enterprise and production users of the world such as Lyft Level 5, Microsoft, AstraZeneca and many others, are realizing real product value through deploying PyTorch at scale. MLOps and end-to-end lifecycle management of machine learning is an important bridge between these communities, and the combination of state of the art AI developed on PyTorch and MLOps is a powerful force that will bring the impact of cutting edge research to products.

Today, we are announcing a number of technical contributions to enable end-to-end support for MLflow usage with PyTorch, including support for: autologging via PyTorch Lightning; TorchServe integration through a new deployment plug-in; and a sample end-to-end workflow targeting HuggingFace Transformers. This work, done in collaboration between members of the technical teams at Databricks/MLflow core maintainers and the PyTorch core development team within Facebook, is just the beginning. We expect more technical contributions in the coming months.

The Rise of MLOps

Per Wikipedia, MLOps, is defined as:

A compound of “machine learning” and “operations”, refers to the practice for collaboration and communication between data scientists and operations professionals to help manage production ML (or deep learning) lifecycle.

MLOps involves how users manage models within the various phases of the life cycle including model development, A/B testing, continuous integration/delivery, monitoring, etc. Over the last decade during the early stages of the AI Research boom, these were not generally areas of concern for the community, unless you were running at some level of scale. With the proliferation of ML, and the associated product impact that is being felt, the topic of MLOps is front and center and communities are active and highly engaged. However, there are challenges to be dealt with and still work to be done to make the research to production paradigm more broadly available.

Some of the challenges we’ve observed:

  1. Toolchain diversity: The workflow for ML spans data preparation and analysis, model training, optimization, deployment and so much more. The space for tools is fragmented and ever evolving, much like ML itself.
  2. Tracking experiments is difficult: Inherent in developing ML models is the process of experimentation. By extension, developing a model that meets a particular set of product requirements can require a combinatorial explosion of parameters, code versions, data and models trained.
  3. Reproducibility is elusive: Given the number of variables and sources of randomness within ML systems, it is hard to reproduce results. Generally speaking getting the same code to work again, debugging problems or just sharing reproducible workflows continues to be a challenge.
  4. Deployment still isn’t easy: The environment for production is fragmented and diverse including batch and streaming inference, on device inference and there is generally no standard way to deploy in a low risk manner.
  5. Continuous, iterative process: Unlike traditional software development, building ML models and managing them is a continuous iterative process. Constant experimentation is required to optimize for metrics, like accuracy. Models can drift over time due to changes in data, user behaviour and overall a continuous training/retraining/feedback loop is required for good results over time.
  6. Cloud independence: Managed services are great but also have a few downsides. These include a potential lack of customization, flexibility and vendor lock-in. When a product starts to hit a level of scale, these factors increasingly grow in importance.

MLflow + PyTorch = 🧡

As a first step toward enabling an end-to-end exploration to production platform for PyTorch, we partnered with the MLflow community to more tightly integrate the projects. This included adding support for autologging (log metrics, parameters, and models without the need for explicit log statements) and integration of a TorchServe model serving deployment plug-in to support performant model inference. To tie things all together, we are also providing an end-to-end NLP workflow based on Hugging Face Transformers that users can leverage to get started.

Additionally, support for TorchScript, a way to serialize and optimize models for deployment in a python-free process, and distributed training to support larger and fast model development are also available. The next section will walk through some of the details of each of these areas.

Figure 1: MLflow + PyTorch


Autolog enables ML model builders to automatically log and track parameters and metrics from PyTorch models in MLflow. We used PyTorch Lightning as the training loop to add support for auto logging based on best practices for core model metrics logging and tracking of MLflow experiments. The Autolog feature automatically logs parameters like the optimizer names, learning rates; metrics like training loss, validation loss, accuracies; and models in the form of artifacts and checkpoints. When you use early stopping, model checkpoints, early stopping parameters and metrics are also logged. And users have the flexibility to log their own custom metrics like F1 score.

Model Artifacts & TorchScript Support

One can now save and load the PyTorch models in both eager and TorchScript modes with the ability to save additional model artifacts like the vocabulary files for NLP models.

TorchScript is a static subset of python language optimized for ML applications and what we recommend for production model serving. As part of the optimization phase of getting your models ready for production, you convert the models to TorchScript format and then deploy in production. And now you can save and load the TorchScripted models using MLflow.

All the artifacts get saved as part of the MLflow MLmodel bundle which makes it easy to manage the different stages of the model as it moves from Development, Staging to Production stages in the MLflow Model Registry.

Deployment via TorchServe

With the MLflow TorchServe plugin, users can now get the complete MLOps lifecycle down to the serving of models.

TorchServe is a PyTorch model serving library that accelerates the deployment of PyTorch models at scale with support for multi-model serving, model versioning, A/B testing, model metrics. It comes with default handlers for common use cases like object detection and text classification and a model zoo to quickly get started.

You will have the option of either using the command line interface or the python API for creating and managing the deployments and running inference against it. Both local and remote installations of TorchServe are supported by the plugin allowing you to have an optimal workflow for different stages of your development lifecycle.

MLprojects for reproducible runs

All the samples come with the familiar ML Projects packaging to make it easy for anyone to get started for reproducible runs.

Distributed Training

You can use the same familiar flow for large models that require distributed training. Full workflows for using the PyTorch Distributed Data Parallel training have been provided to make it easy for your team to get started.

MLOps Workflow for PyTorch + MLflow + TorchServe

Putting all this together, here is what the full workflow for MLops looks like for building PyTorch models and deploying on the TorchServe plugin using the full MLflow MLops lifecycle management.

Figure 2: Full MLops workflow for PyTorch models building and serving using MLflow

What’s next?

In the near future watch out for new features and examples related to:

  • End-to-end workflows for NLP, computer vision
  • Support for hyperparameter optimization
  • Programmatic interpretability using Captum
  • Support for a feature Store

Check out the HF BERT News Classification example to see things in action.


We’d like to thank Sid Murching, Harutaka Kawamura, Paul Olgivie, Matei Zaharia, William Falcon, Sean Naren, Srikanth Suresh, Karthik Sundarman, Ankan Ghosh, Jules Damji and Hamid Shojanazeri without whom this work would not have been possible.


Team PyTorch




PyTorch is an open source machine learning platform that provides a seamless path from research prototyping to production deployment.