Accelerating PyTorch Inference with Torch-TensorRT on GPUs

Published in

PyTorch

2 min readJun 16, 2022

Today, we are pleased to announce that Torch-TensorRT has been brought to PyTorch. The official repository for Torch-TensorRT now sits under PyTorch GitHub org and documentation is now hosted on pytorch.org/TensorRT.

Torch-TensorRT is now an official part of the PyTorch ecosystem. The PyTorch ecosystem includes projects, tools, models and libraries from a broad community of researchers in academia and industry, application developers, and ML engineers.

Torch-TensorRT aims to provide PyTorch users with the ability to accelerate inference on NVIDIA GPUs with a single line of code.

Torch-TensorRT

NVIDIA TensorRT is an SDK for high-performance deep learning inference that delivers low latency and high throughput for inference applications across GPU-accelerated platforms running in data centers, embedded and edge devices.

Torch-TensorRT is an integration for PyTorch that leverages inference optimizations of TensorRT on NVIDIA GPUs. With just one line of code, it provides a simple API that gives up to 4x performance speedup on NVIDIA GPUs.

This integration takes advantage of TensorRT optimizations, such as FP16 and INT8 reduced precision through Post-Training quantization and Quantization Aware training, while offering a fallback to native PyTorch when TensorRT does not support the model subgraphs.

Getting Started

import torch_tensorrttrt_module = torch_tensorrt.compile(model,
    inputs = [torch_tensorrt.Input((1, 3, 224, 224))], # input shape   
    enabled_precisions = {torch_tensorrt.dtype.half} # Run with FP16
)# save the TensorRT embedded Torchscript
torch.jit.save(trt_module, “trt_torchscript_module.ts”)result = trt_module(input_data) # Run inference

Learn more about Torch-TensorRT’s features with a detailed walkthrough example here.

Summary

PyTorch is a leading deep learning framework today, with millions of users worldwide. Torch-TensorRT enables PyTorch users with extremely high inference performance on NVIDIA GPUs while maintaining the ease and flexibility of PyTorch through a simplified workflow when using TensorRT with a single line of code.

Download and try samples from GitHub Repository here and full documentation can be found here.

We would be deeply appreciative of feedback on the Torch-TensorRT by reporting any issues via GitHub or TensorRT discussion forum.

Accelerating PyTorch Inference with Torch-TensorRT on GPUs

Torch-TensorRT

Getting Started

Summary

Written by Jay Rodge