TorchServe and [TorchElastic for Kubernetes], new PyTorch libraries for serving and training models at scale

PyTorch
PyTorch
Published in
4 min readApr 21, 2020

Authors: Joe Spisak (Facebook), Aditya Bindal (AWS), Kiuk Chung (Facebook), Mike Stefaniak (AWS)

As PyTorch is used more and more in production environments, we’ve continued to see the need to provide better tools and platforms for the community to scale up training and deploy models efficiently.

Today, we are excited to introduce TorchServe (Experimental), a new open-source model serving library under the PyTorch project. TorchServe is the result of a collaboration between Facebook and AWS engineers aimed at providing a clean, well supported, and industrial-grade path to deploying PyTorch models for inference at scale. This library is available for free as part of the PyTorch open-source project.

We are also announcing the availability of a new co-developed, between Facebook and AWS, Kubernetes controller with tight integration to TorchElastic (Experimental), a library for fault-tolerant and elastic training in PyTorch. With the TorchElastic Kubernetes controller, developers can create fault-tolerant distributed training jobs in PyTorch using their Kubernetes clusters, including Amazon EC2 Spot instances on Amazon Elastic Kubernetes Service (EKS).

In the rest of this post, we describe these new PyTorch libraries in detail and provide resources on how to get started.

TorchServe

Deploying machine learning models for inference at scale is not easy. Developers must collect and package model artifacts, create a secure serving stack, install and configure software libraries for prediction, create and expose APIs and endpoints, generate logs and metrics for monitoring, and manage multiple model versions on potentially multiple servers. Each of these tasks adds time and complexity and can slow down model deployment by weeks, sometimes months. Further, optimizing a serving stack for low latency online applications is still more of an art than a science. Lastly, and until now, PyTorch developers lacked a canonical and officially supported way to deploy PyTorch models. That’s why we are releasing TorchServe, the PyTorch library for deploying trained models.

Below is a simple example of how to take a trained model from torchvision and deploy it using TorchServe:

#Download a trained PyTorch model
wget https://download.pytorch.org/models/densenet161-8d451a50.pth

#Package model for TorchServe and create model archive .mar file
torch-model-archiver \
--model-name densenet161 \
--version 1.0 \
--model-file examples/image_classifier/densenet_161/model.py \
--serialized-file densenet161–8d451a50.pth \
--extra-files examples/image_classifier/index_to_name.json \
--handler image_classifier

mkdir model_store
mv densenet161.mar model_store/

#Start TorchServe model server and register DenseNet161 model
torchserve — start — model-store model_store — models densenet161=densenet161.mar

The experimental release of TorchServe, available today, includes:

  • Clean APIs — Support for an Inference API for predictions and a Management API for managing the model server.
  • Secure Deployment — Includes HTTPS support for secure deployment.
  • Robust model management capabilities — Allows full configuration of models, versions, and individual worker threads via command line interface, config file, or run-time API.
  • Model archival — Provides tooling to perform a ‘model archive’, a process of packaging a model, parameters, and supporting files into a single, persistent artifact. Using a simple command-line interface, you can package and export in a single ‘.mar’ file that contains everything you need for serving a PyTorch model. This `.mar’ file can be shared and reused. Learn more here.
  • Built-in model handlers — Support for model handlers covering the most common use-cases (image classification, object detection, text classification, image segmentation). TorchServe also supports custom handlers.
  • Logging and Metrics — Support for robust logging and real-time metrics to monitor inference service and endpoints, performance, resource utilization, and errors. You can also generate custom logs and define custom metrics.
  • Model Management — Support for management of multiple models or multiple versions of the same model at the same time. You can use model versions to roll back to earlier versions or route traffic to different versions for A/B testing.
  • Prebuilt Images — Ready to go Dockerfiles and Docker images for deploying TorchServe on CPU and NVIDIA GPU based environments. The latest Dockerfiles and images can be found here.

Getting Started with TorchServe

  • You can get started at pytorch.org/serve with installation instructions, tutorials and docs.
  • If you have questions, please drop it into the PyTorch discussion forums using the ‘deployment’ tag or file an issue on GitHub with a way to reproduce.

Kubernetes Controller with TorchElastic Integration

With larger and larger models being trained, such as RoBERTa and TuringNLG, the need to scale out to a distributed cluster is increasingly important. The use of preemptible instances, such as Amazon EC2 Spot instances, to satisfy this need is a common practice. However, these preemptible instances are unpredictable by their very nature. The integration of Kubernetes and TorchElastic allows PyTorch developers to train machine learning models on a cluster of compute nodes that can dynamically change without disrupting the model training process. The built-in fault tolerant capabilities of TorchElastic can pause node level training even when a node goes down and resume once the node is healthy again.

Additionally, using the Kubernetes controller with TorchElastic, you can run mission-critical distributed training jobs on clusters with nodes that get replaced, either due to hardware issues or node reclamation. Training jobs can launch with partial requested resources, and dynamically scale as resources become available without being stopped or restarted. To take advantage of these capabilities, users can simply specify training parameters in a simple job definition and Kubernetes-TorchElastic package will manage the job’s life cycle.

Below is a simple example of a TorchElastic configuration for an imagenet training job:

apiVersion: elastic.pytorch.org/v1alpha1
kind: ElasticJob
metadata:
name: imagenet
namespace: elastic-job
spec:
rdzvEndpoint: $ETCD_SERVER_ENDPOINT
minReplicas: 1
maxReplicas: 2
replicaSpecs:
Worker:
replicas: 2
restartPolicy: ExitCode
template:
apiVersion: v1
kind: Pod
spec:
containers:
- name: elasticjob-worker
image: torchelastic/examples:0.2.0rc1
imagePullPolicy: Always
args:
- "--nproc_per_node=1"
- "/workspace/examples/imagenet/main.py"
- "--arch=resnet18"
- "--epochs=20"
- "--batch-size=32"
- "/workspace/data/tiny-imagenet-200"

Getting Started with TorchElastic on Kubernetes

  • Learn more about the Kubernetes Controller design here.
  • Full docs and tutorials can be found here.

Cheers!

Joe, Aditya, Kiuk & Mike

--

--

PyTorch
PyTorch

PyTorch is an open source machine learning platform that provides a seamless path from research prototyping to production deployment.