Yolact++ Pytorch to ONNX & TensorRT

Abhishek Agrawal
4 min readAug 7, 2022

--

Yolact++, which is a very efficient model for object detection and Instance segmentation but the model has to go to production as well in the most efficient way.

Yolact Output Example

The general process of deploying a Pytorch model is to serve its TensorRT engine. There are 2 ways to get TensorRT engine from Pytorch.

Method 1 — Torch to TensorRT

- One is using tools like torch-tensorrt which uses torch’s scripting mode to trace the model

- Second method is to convert torch model to ONNX model using torch’s tracing mode.

Second Method — Torch-ONNX-TensorRT

Tracing a Pytorch model using trace mode is easier than scripting mode, compiler tries to trace the dynamic ops as well in script mode and it demands a lot of changes in the way the code is written. So, we decided to get TensorRT model by taking the path of ONNX representation.

What made Yolact++ deployment challenging?

In general, though getting the TensorRT model is not very straight forward yet with few hacks one can make it work. Yolact++ uses a custom layer called DCNv2 in the architecture which made it difficult for us as it is not supported by either of the Torch, ONNX, OnnxRuntime and TensorRT. So, in order to support DCNv2, one need to add some .cpp and .h files to OnnxRuntime to enable inference of ONNX model. To create and run TensorRT engine, one also need to add TensorRT plugin files to TensorRT source and compile.

Setup and Installations

So, here is the good news. MMCV has supported DCNv2 as ModulatedDeformConv2dPack inside mmcv.ops. With this, DCNv2 did not create any further trouble in exporting to ONNX but now we need to use our patience to support this custom op in OnnxRuntime and TensorRT.

Tricks to perform during ONNX export-

  1. make use_jit=False
  2. Replace DCNv2 import with ModulatedDeformConv2d
  3. You will have to slightly modify the pretrained weights loading part if you replace DCNv2 with ModulatedDeformConv2d after training.

Open-mmlab has created the MMDeploy library for the purpose of deployment of models and ops supported by open-mmlab libraries. All the required .cpp .cu, .h, .hpp are there in mmdeploy source, we just need to carefully follow the process mentioned here

Please do not blindly follow the link, look what CUDA version your system has right now. MMDeploy supports both TensorRT7 and TensorRT8.

I did follow the above link with slight modifications. I will mention the exact commands I used.

My setup specification –Ubuntu 18.04Cuda — 11.3TensortRT source — TensorRT-8.0.3.4.Linux.x86_64-gnu.cuda-11.3.cudnn8.2.tar.gzCUDNN source — cudnn-11.3-linux-x64-v8.2.1.32.tgz

Step 1 — Create conda environment and install torch and mmcv-full

conda create -n mmdeploy python=3.7 -y
conda activate mmdeploy
conda install pytorch torchvision cudatoolkit=11.3 -c pytorch
export cu_version=cu113
export torch_version=torch1.12(this can change for you and in future as well, so please check this)
pip install mmcv-full==1.6.0 -f https://download.openmmlab.com/mmcv/dist/${cu_version}/${torch_version}/index.html

Get OnnxRuntime ready for compile with MMdeploy –

pip install onnxruntime==1.8.1
wget https://github.com/microsoft/onnxruntime/releases/download/v1.8.1/onnxruntime-linux-x64-1.8.1.tgz
tar -zxvf onnxruntime-linux-x64-1.8.1.tgz
cd onnxruntime-linux-x64-1.8.1
export ONNXRUNTIME_DIR=$(pwd)
export LD_LIBRARY_PATH=$ONNXRUNTIME_DIR/lib:$LD_LIBRARY_PATH

Get TensorRT ready for compile with MMdeploy. TensorRT requires CuDNN as well –

a. TensorRT

cd /the/path/of/tensorrt/tar/gz/filetar -zxvf TensorRT-8.0.3.4.Linux.x86_64-gnu.cuda-11.3.cudnn8.2.tar.gzpip install TensorRT-8.0.3.4/python/tensorrt-8.0.3.4-cp37-none-linux_x86_64.whlexport TENSORRT_DIR=$(pwd)/TensorRT-8.0.3.4
export LD_LIBRARY_PATH=$TENSORRT_DIR/lib:$LD_LIBRARY_PATH
pip install pycuda

b. CuDNN

cd /the/path/of/cudnn/tgz/file
tar -zxvf cudnn-11.3-linux-x64-v8.2.1.32.tgz
export CUDNN_DIR=$(pwd)/cuda
export LD_LIBRARY_PATH=$CUDNN_DIR/lib64:$LD_LIBRARY_PATH

Finally Build and Install MMdeploy from source with OnnxRuntime and TensorRT

git clone https://github.com/open-mmlab/mmdeploy
cd mmdeploy
export MMDEPLOY_DIR=$(pwd)
For ONNX inferencecd ${MMDEPLOY_DIR}
mkdir -p build && cd build
cmake -DCMAKE_CXX_COMPILER=g++-7 -DMMDEPLOY_TARGET_BACKENDS=ort -DONNXRUNTIME_DIR=${ONNXRUNTIME_DIR} ..
make -j$(nproc) && make install
For TensorRT inferencecd ${MMDEPLOY_DIR}
mkdir -p build && cd build
cmake -DCMAKE_CXX_COMPILER=g++-7 -DMMDEPLOY_TARGET_BACKENDS=trt -DTENSORRT_DIR=${TENSORRT_DIR} -DCUDNN_DIR=${CUDNN_DIR} ..
make -j$(nproc) && make install
Install Model Convertercd ${MMDEPLOY_DIR}
pip install -e .

Code to export and inference on DCNv2 Module

  1. Save ONNX model and Save TensorRT engine
import torch
from mmcv.ops import ModulatedDeformConv2dPack
from mmdeploy.apis.tensorrt import from_onnx
from mmdeploy.backend.tensorrt import save

model = ModulatedDeformConv2dPack(3, 8, 3, 1, 1).cuda()
x = torch.rand(1, 3, 32, 32).cuda()
model_name = 'tmp'

# export onnx
torch.onnx.export(
model,
x,
model_name+".onnx",
input_names=['input'],
output_names=['output'],
opset_version=11)

# create tensorrt engine
engine = from_onnx(
model_name + '.onnx',
model_name,
input_shapes=dict(
input=dict(min_shape=x.shape, opt_shape=x.shape, max_shape=x.shape)),
max_workspace_size=1 << 30)

# save engine
save(engine, model_name + '.engine')

2. Inference with TensorRT engine —

import torch
from mmdeploy.backend.tensorrt import TRTWrapper, load

x = torch.rand(1, 3, 32, 32).cuda()
model_name = 'tmp'

engine = load(model_name + '.engine')

# create wrapper
wrapper = TRTWrapper(engine)

with torch.no_grad():
trt_output = wrapper({'input': x})['output']
## There are ways to skip torch during inference. please check this

Notes: -

  1. If you want to your TensorRT engine with NVIDIA’s Triton server then do follow this link.
  2. You can generate TensorRT engine without MMdeploy and only with MMCV as well but MMCV is going to be deprecated soon and It only supports TensorRT7 with cuda10. I did create tensorRT engine with only MMCV as well. You can follow this for the same.
  3. One does not need to add the model in mmdet or mmseg in order to use mmdeploy.
  4. Onnx inference is only supported for CPU while TensorRT inference is supported for GPU.
  5. TensorRT creation didn’t work for me with opset_version=13 so I used opset_version=11.
  6. FP16 support is there in TensorRT plugins. I have not been able to workout INT8 support.

--

--

Abhishek Agrawal

Data Scientist experienced in designing computer vision solutions. I hack CNN models to deploy them optimally. Efficient in ML research.