``Interoperability is holding back machine learning infrastructure

The explosion of machine learning applications across industry use-cases has resulted in an equally explosive growth of deep learning tooling. At nearly every level of the technology stack, from hardware (AWS Inferentia, Google TPU, Habana Labs) to infrastructure platforms (Ray, Kubeflow, Determined AI), teams are competing to be at the forefront of the ML adoption curve. One major challenge of this fast-moving landscape is achieving framework interoperability: the ability to support any Python machine learning framework such as TensorFlow, PyTorch, MXNet, xgboost, or scikit-learn.

The machine learning framework ecosystem is a confusing web, with all paths leading to Python. Source

In this post, I’ll examine how we’ve found this requirement can lead to frustrating user experiences through the lens of machine learning serving infrastructure, and why we’re taking a different approach with Model Zoo.

The evolution of deep learning frameworks

As an example of just how quickly libraries are moving, let’s consider the fast-moving landscape of deep learning application frameworks in the past decade:

(2011 ~ 2015): C++, CUDA, and customized frameworks.

AlexNet was one of the most influential works of computer vision in 2011, setting off a trend of applying CNNs and GPUs to computer vision problems that have lasted nearly a decade. Alex Krizhevsky wrote his code in C++ / CUDA, which was typical of ML practitioners of the time.

(2015 ~ present): Differentiable programming

In 2015, Google released the public version of TensorFlow, which it was already using to power internal DL applications in production. Google didn’t invent this category (Torch, Caffe, Theano all already existed), but the backing of an industrial player with deep pockets quickly catapulted TensorFlow to a leader in the space. Nearly every major tech player followed with their own framework: Amazon MXNet, Microsoft CNTK, and Facebook PyTorch. This category is still evolving: new frameworks like JAX (2019) continue to gain popularity.

(2015 ~ present): Ease of Use and Inversion of Control

Around the same time as the second wave, Keras came onto the scene with a new approach: value simple APIs over expressiveness. Keras wasn’t trying to replace TensorFlow capabilities, just to put them behind a simpler API. Keras used inversion of control to help reduce the boilerplate needed: you don’t need to write a training loop, just use “model.fit()”. PyTorch Lightning is another great example of a framework taking an ease-of-use centric approach: trim the boilerplate and let the framework handle the main control loop while the user focuses on the task-specific code.

Keras wraps layers of complexity to give an easy-to-use experience for users.

(2019 ~ present) Task-Specific Frameworks

In 2020, deep learning frameworks are becoming task-specific. There are now well-established tasks that power specific business use-cases: ranging from object detection to speech recognition to machine translation. As such, frameworks devoted specifically to subsets of tasks have sprung up and are quickly gaining steam. Hugging Face Transformers has become a rising star for anyone experimenting with transformers-based architecture on NLP tasks. Its advantage is that it can abstract away the task-specific pieces that are still commonly shared among NLP: tokenization, pre-trained weights, and common architectures such as BERT. Here is an example of using the library to do sentiment analysis in three lines of code:

from transformers import pipeline
sentimet_analyzer = pipeline("sentiment-analysis")
sentiment = sentimet_analyzer("This is the easiest framework to use")

It’s not just happening in NLP. If you’re working on a well-defined task for a business use-case, it is rarely a good idea to start writing TensorFlow or PyTorch code from scratch. Working on image segmentation? https://github.com/facebookresearch/detectron2 is probably a good place to start. Working with graph neural networks? Better take a look at https://github.com/rusty1s/pytorch_geometric.

Why is this a problem for machine learning platforms?

When designing for a fast-moving landscape such as this one, a key challenge becomes interoperability. Developers don’t want to be locked into only using TensorFlow, or PyTorch, or the next framework that comes along. Machine learning infrastructure platforms today (open-source and proprietary) tend to support this by using an interface defined at the level of the most common denominator: Python code. Want to train a model on AWS Sagemaker? Your first task is to create a Python training script that conforms to their expected constraints. Want to deploy one? Wrap Python inference code in a Docker image that runs a web server at a pre-arranged port (more on this later). Machine learning platforms like AWS Sagemaker have become glorified Python workflow management engines — giving you the flexibility to move between any framework you’d like, but at the cost of error-prone integrations into your framework of choice.

Let’s dive into an example by taking a closer look at a common use-case today: infrastructure for supporting server-side real-time predictions via network requests.

Interoperability in Machine Learning Serving Infrastructure

Framework-Agnostic System

Systems like SageMaker require a “productionizing” step for model serving.

Incumbent machine learning serving platforms require the user to “productionize” their model into a Docker image or an inference package of Python code. This level of interface is fully interoperable to work with models of your choosing, but adds an expensive step to the machine learning development cycle that requires a different tooling and a different skillset than model development. In large organizations where different people can be responsible for model development and model production, the handoff process can lead to significant delays in getting the model into production. In one instance, I spoke to a model developer who spent weeks developing a model in PyTorch, only to find out during “productionizing” that their backend engineers only knew how to support TensorFlow.

Standards-Based Approach

With standards, the “productionizing” step is removed.

We decided to take a different approach when designing the interface for Model Zoo. We provide a streamlined deployment API over a framework-agnostic approach that supports arbitrary Python. In other words, the platform takes on the complexity of “productionizing” for users. It does this by relying on a mixture of framework-agnostic standards (MLFlow and Open Neural Network Exchange (ONNX)) and framework-specific formats (PyTorch TorchScript, TensorFlow SavedModel, and Hugging Face Pipelines) under the scenes. The Model Zoo client library will deploy models from the framework level and the production step becomes an implementation detail:

import modelzoo.transformers
from transformers import pipeline
# Initialize a Hugging Face Transformers pipeline.
pipeline = pipeline(“text-generation”)
# Deploy with one function.
model_name = modelzoo.transformers.deploy(pipeline, “text-generator”, version=1)

This interface results in an experience that we hope will feel more native to model developers:

  1. Deploy from anywhere you run your Python, such as a Jupyter notebook or your favorite IDE.
  2. No need to write and test production-specific Python code. The model artifacts defined at model development time encapsulate all the logic required for production.
  3. No fumbled handoffs between the model development team and model production team. The interface is defined at the level of model development APIs.
  4. Easily integrate an automated deployment process into a training pipeline by simply adding a “deploy()” line at the end of your training script.

Try out one of our quickstarts using TensorFlow or Hugging Face Transformers today — it’s free up for up to three models! Sign up for our private beta to see if you qualify for unlimited access.

machine learning infrastructure at Model Zoo. Previously at Determined AI, Google.