A Practical Guide to TorchServe

4 min readJun 24, 2022

Model deployment and management are one of the most important aspects of data science engineering. It helps to serve our models, take care of versioning, scaling, and also makes inference faster.

In this article, I’ll be talking about TorchServe, which is an open-source model serving framework for PyTorch that makes it easy to deploy trained PyTorch models performantly at scale without having to write custom code. TorchServe delivers lightweight serving with low latency, so you can deploy your models for high-performance inference. (Source: torchserve-on-aws)

Note: You can find all the relevant codes in this GitHub repo.

How it works

TorchServe takes a PyTorch deep learning model and wraps it in a set of REST APIs. Currently, it comes with a built-in web server that you run from the command line.
This command-line call takes in the single or multiple models you want to serve, along with additional optional parameters controlling the port, host, and logging.
TorchServe supports running custom services to handle the specific inference handling logic. These are covered in more detail in the custom service documentation

TorchServe Architecture

Frontend: The request/response handling component of TorchServe. This portion of the serving component handles both request/response coming from clients and manages the lifecycles of the models.
Model Workers: These workers are responsible for running the actual inference on the models. These are actual running instances of the models.
Model: Models could be a script_module (JIT saved models) or eager_mode_models. These models can provide custom pre- and post-processing of data along with any other model artifacts such as state_dicts. Models can be loaded from cloud storage or from local hosts.
Plugins: These are custom endpoints or authz/authn or batching algorithms that can be dropped into TorchServe at startup time.
Model Store: This is a directory in which all the loadable models exist (.mar files).

Steps to perform to Serve your PyTorch models with TorchServe

1. Torch-model-archiver

TorchServe required the model and its dependent artifacts to be packaged in a single file to beready to serve.

torch-model-archiver is a python package that helps package the artifacts to a .mar file which we store in our model_store/ directory.

In order to do this packaging we need to run the following command:

torch-model-archiver --model-name <model's-name-as-.mar-file>\
                     --version <model-version>\
                     --serialized-file <trained-model's-path>\
                     --handler <handler.py file - explaining next>\
                     --export-path <path-to-model_store/>\
                     --extra-files <any-extra-files-configs-etc>

Example of a custom handler.py file

Note: In the same model_store/, you can also archive multiple models (if you require multiple models to serve) using another torch-model-archiver command with the required argument values and a different model_handler.py file.

2. Start TorchServe

Once artifacts packaging is done, we are ready to serve the same. The serving process starts with the deployment of the TorchServe REST APIs.

These APIs are the Inference API, Management API, and Metrics API, deployed by default on localhost in ports 8080, 8081, and 8082, respectively. However, you can also configure these and other parameters in the ts-config.properties file.

The command to start TorchServe:

torchserve --start \
           --ncs \
           --ts-config <path-to-config.properties-file>\
           --log-config <path-to-log4j.properties-file>\
           --model-store <path-to-model_store/>\
           --models <all, if-want-to-deploy-all-models or .mar file>

Example of a custom config.properties file:

3. Check status and config of models deployed

In order to check the availability of the deployed TorchServe inference API, you can just send an HTTP GET request to the Inference API deployed.

curl http://localhost:6060/ping

If everything goes as expected, it should output the following response:

{  
    "status": "Healthy"
}

Similarly, you can also check the config of the models deployed by pinging the TorcheServe management API,

curl http://localhost:6061/models

If everything goes as expected, it should output the following response:

{
  "models": [
    {
      "modelName": "<.mar-file-name>",
      "modelUrl": "<model>.mar"
    },
    {} <in-case-of-multiple-models-deployed>  ]
}

To get details of a model deployed:

curl http://localhost:6061/models/<one-of-the-.mar-file-name>

It should output the following response:

[
  {
    "modelName": "<.mar-file-name>",
    "modelVersion": "<model-version>",
    "modelUrl": "<.mar-file>",
    "runtime": "python",
    "minWorkers": 1,
    "maxWorkers": 1,
    "batchSize": 1,
    "maxBatchDelay": 100,
    "loadedAtStartup": true,
    "workers": [
      {
        "id": "9000",
        "startTime": "2022-06-24T12:52:12.170Z",
        "status": "READY",
        "memoryUsage": 3148378112,
        "pid": 6903,
        "gpu": true,
        "gpuUsage": "gpuId::0 utilization.gpu [%]::0 % utilization.memory [%]::0 % memory.used [MiB]::1640 MiB"
      }
    ]
  }
]

4. Stop TorchServe

Once you are done and you no longer need TorchServe, you can gracefully shut it down with the following command:

torchserve --stop

5. Sample Prediction

curl -X POST http://localhost:6060/predictions/pt_classifier \
        -H 'Content-Type: application/pdf' \
        -f '{"data": <file-data>}'

That’s it!

You’ve now successfully configured and deployed your PyTorch model(s) with TorchServe. 💁

Conclusion

Kudos to you for reaching this far. I hope you found this article useful.

Hopefully, I’ve covered the TorchServe well enough in this article to give you a basic understanding of this wonderful tool and help you get started with the same.

Please share your views in the comment section or any suggestions for further articles. Cheers!

Sourav Verma

If you enjoyed this, follow me on medium for more such Articles.
Connect with me on LinkedIn.