How to optimize custom Detectron2 models with TensorRT

10 min readSep 17, 2022

Recently I was tasked with creating a script that can automize the conversion of a custom Detectron2 Mask-RCNN into a TensorRT engine on a Jetson TX2. The motivation behind this was that Detectron2 was very slow yet accurate, so we wanted to optimize it so that it can run faster without losing to much accuracy. At first, the task seemed unsurmountable, however after breaking it apart into pieces it was doable with trial and error. Firstly, I had to start by getting a fresh flash of Jetpack onto the Jetson TX2. This was the easy part. You can follow instructions from NVIDIA’s website

Then after that was done, I had to get Detectron2 working in either a cloud or host instance. In this case I chose Google Colab for training the model and the preliminary conversions. The general way that we will be converting the Mask RCNN model into the TensorRT engine is by first doing preliminary conversions from Detectron2 into onnx format. Then because TensorRT doesn’t support all onnx’s operations, we need to convert it again using TensorRT’s model converter to convert it into a final onnx format. Then we can transform it into an engine using TensorRT’s build_engine.py file which is found in the samples of their main github page. Now let’s get started with the conversions.

To begin open up a Google Colab instance and, you are going to need to change your runtime type to GPU. To do so, click on runtime at the top bar, then click “Change runtime type”. This will create a popup that lets you select what runtime type you want to use. You will need to choose GPU for our purposes.

These commands will get you all the Python packages necessary to run Detectron2 and its converter. Note that I begin by uninstalling any versions of Pytorch that may be cached because it will interfere with the conversion, and it is necessary to have the stated versions as per TensorRT Prerequisites

!pip uninstall -y torch torchvision torchaudio!pip install pyyaml==5.1!pip install onnx==1.8.1!pip install onnxruntime!pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1 -f https://download.pytorch.org/whl/torch_stable.html!pip install fvcore!pip install iopath!python3 -m pip install pycuda

Then, the following commands will get the newest versions of TensorRT installed onto Google Colab. While it might seem like it is trying to install TensorRT 5.1.2.2, I have been testing this recently and that is not the case- it does in fact install the latest version.

%cd /content
!wget -O nv-tensorrt-repo-ubuntu1804-cuda10.0-trt5.1.2.2-rc-20190227_1–1_amd64.deb https://www.dropbox.com/s/45pz13r4e8ip4bl/nv-tensorrt-repo-ubuntu1804-cuda10.0-trt5.1.2.2-rc-20190227_1-1_amd64.deb?dl=0!dpkg -i nv-tensorrt-repo-ubuntu1804-cuda10.0-trt5.1.2.2-rc-20190227_1–1_amd64.deb!apt-key add /var/nv-tensorrt-repo-cuda10.0-trt5.1.2.2-rc-20190227/7fa2af80.pub!apt-get update!apt-get install -y — no-install-recommends libnvinfer5=5.1.2–1+cuda10.0!apt-get install -y — no-install-recommends libnvinfer-dev=5.1.2–1+cuda10.0!apt-get install tensorrt!apt-get install python3-libnvinfer-dev!apt-get install uff-converter-tf!apt-get dist-upgrade!pip install nvidia-pyindex

Next, these following commands will clone and install Detectron2 and TensorRT’s graphsurgeon. Detectron2 is necessary to be installed for the obvious reason that it is the model that we will be using. Likewise, TensorRT’s graphsurgeon is needed for the conversion into onnx model format.

%cd /content!git clone https://github.com/facebookresearch/detectron2.git%cd /content/detectron2!git checkout 48b598b4f61fbb24182a69b521b2a0ba3252b842!python3 setup.py install%cd /content!git clone https://github.com/NVIDIA/TensorRT.git /content/TensorRT%cd /content/TensorRT!git submodule update — init — recursive%cd /content/TensorRT/tools/onnx-graphsurgeon!python setup.py install

Next, this is the test dataset that I will be using. It consists of prelabelled pictures of balloons and is perfect for beginners to begin initial testing with.

%cd /content!wget https://github.com/matterport/Mask_RCNN/releases/download/v2.1/balloon_dataset.zip!unzip balloon_dataset.zip > /dev/null

Next, before you continue, you are going to want to edit one of Detectron2’s files. Namely, as per TensorRT’s github you need to change line 165 in Detectron2 at the detectron2/tools/deploy/export_model.py directory from

aug = T.ResizeShortestEdge( 
 [cfg.INPUT.MIN_SIZE_TEST, cfg.INPUT.MIN_SIZE_TEST], cfg.INPUT.MAX_SIZE_TEST 
)

aug = T.ResizeShortestEdge( 
 [1344, 1344], 1344 
)

This is because of two reasons.

Detectron2 Mask RCNN supports dynamic shapes ranging from 800x800 to 1333x1333 while TensorRT does not currently support dynamic input shapes.
The model’s input must be divisible by 32.

Next, we will begin by importing some common libraries that we will need later on in the code.

import detectron2from detectron2.utils.logger import setup_loggersetup_logger()# import some common librariesimport numpy as npimport cv2import matplotlib.pyplot as plt# import some common detectron2 utilitiesfrom detectron2 import model_zoofrom detectron2.engine import DefaultPredictorfrom detectron2.config import get_cfgfrom detectron2.utils.visualizer import Visualizerfrom detectron2.data import MetadataCatalog, DatasetCatalogimport osimport numpy as npimport jsonfrom detectron2.structures import BoxMode

Next, as per Detectron’2 official page, we are going to need to register the dataset. The following code will help do so.

def get_balloon_dicts(img_dir):    json_file = os.path.join(img_dir, "via_region_data.json")    with open(json_file) as f:        imgs_anns = json.load(f)    dataset_dicts = []    for idx, v in enumerate(imgs_anns.values()):        record = {}        filename = os.path.join(img_dir, v["filename"])        height, width = cv2.imread(filename).shape[:2]        record["file_name"] = filename        record["image_id"] = idx        record["height"] = height        record["width"] = width        annos = v["regions"]        objs = []        for _, anno in annos.items():            assert not anno["region_attributes"]            anno = anno["shape_attributes"]            px = anno["all_points_x"]            py = anno["all_points_y"]            poly = [(x + 0.5, y + 0.5) for x, y in zip(px, py)]            poly = [p for x in poly for p in x]            obj = {           "bbox": [np.min(px), np.min(py), np.max(px), np.max(py)],           "bbox_mode": BoxMode.XYXY_ABS,           "segmentation": [poly],           "category_id": 0,            }           objs.append(obj)        record["annotations"] = objs        dataset_dicts.append(record)
    return dataset_dictsfor d in ["train", "val"]:
    DatasetCatalog.register("balloon_" + d, lambda d=d: get_balloon_dicts("balloon/" + d))    MetadataCatalog.get("balloon_" + d).set(thing_classes="balloon")balloon_metadata = MetadataCatalog.get("balloon_train")

Now that the dataset is registered, we can almost begin training. We will first have to setup some configurations.

%cd /content/from detectron2.engine import DefaultTrainercfg = get_cfg()cfg.merge_from_file(model_zoo.get_config_file(“COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml”))cfg.DATASETS.TRAIN = (“balloon_train”,)cfg.DATASETS.TEST = ()cfg.DATALOADER.NUM_WORKERS = 2cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(“COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml”)cfg.SOLVER.IMS_PER_BATCH = 2cfg.SOLVER.BASE_LR = 0.00025cfg.SOLVER.MAX_ITER = 250cfg.SOLVER.STEPS = []cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128

These are the hyperparameters. You can change them as you wish, however I feel as though for this simple dataset it is not necessary. Next, we are going to setup the amount of classes we would like. In this dataset’s example, we only need 1 class, however change the values of the following configurations as you wish.

cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1cfg.MODEL.RETINANET.NUM_CLASSES = 1cfg.MODEL.ROI_KEYPOINT_HEAD.NUM_KEYPOINTS = 1cfg.MODEL.SEM_SEG_HEAD.NUM_CLASSES = 1

Now we can finally begin training. This is the training script that is commonly supplied for Detectron2. Run the following commands to begin training, then come back in a few minutes.

os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)trainer = DefaultTrainer(cfg)trainer.resume_or_load(resume=False)trainer.train()

After you have finished training, you are going to want to test your model for accuracy. You can do this with the following commands.

cfg.MODEL.WEIGHTS = “/content/output/model_final.pth”cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7cfg.DATASETS.TEST = (“balloon_val”, )predictor = DefaultPredictor(cfg)from detectron2.utils.visualizer import ColorModeimport randomimport timedataset_dicts = get_balloon_dicts(“/content/balloon/val”)for d in dataset_dicts:    im = cv2.imread(d[“file_name”])    outputs = predictor(im)    v = Visualizer(im[:, :, ::-1], metadata=balloon_metadata,  scale=0.8)    v = v.draw_instance_predictions(outputs[“instances”].to(“cpu”))    plt.figure(figsize = (14, 10))    plt.imshow(cv2.cvtColor(v.get_image()[:, :, ::-1], cv2.COLOR_BGR2RGB))    plt.show()

This will then display the images in Google Colab for you to see the accuracy. You can train for longer if you feel that it’s not quite accurate enough. Next we are going to want to save and download the configurations. This is important because when we are creating the onnx model with Detectron2 and TensorRT we are going to need the configuration file. This will include the number of classes, the backbone used, the model weights, etc. Running the following command will do so and place it in the output.yaml file. It might be worthwhile to read through this file to make sure it’s properly written.

%cd /contentprint(cfg.dump())with open(“output.yaml”, “w”) as f:f.write(cfg.dump())

Now we are going to need to get a sample image and up sample it to 1344x1344 so that we can use that for the conversion. You can run the following cell to do so.

import cv2I1= cv2.imread(‘/content/balloon/val/14898532020_ba6199dd22_k.jpg’)I2 = cv2.resize(I1, [1344, 1344])cv2.imwrite(‘/content/new.jpg’, I2)

Now we are going to do the preliminary conversion to onnx. This is using Detectron2’s converter. You can run the following command which will convert the Detectron2 model with the custom weights and configurations to an onnx format model.

%cd /content/detectron2!python /content/detectron2/tools/deploy/export_model.py — config-file /content/output.yaml — output /content/model.onnx — format onnx — sample-image /content/new.jpg — export-method caffe2_tracing MODEL.DEVICE cuda MODEL.WEIGHTS /content/output/model_final.pth

This will have created a new mode called model.onnx. We are going to use this in the next step in order to do the secondary conversion to onnx. This secondary conversion will get rid of any/all unsupported onnx operations that TensorRT does not currently support. Run the following cell to do so.

%cd /content/!python /content/TensorRT/samples/python/detectron2/create_onnx.py — onnx /content/converted.onnx — exported_onnx /content/model.onnx/model.onnx — det2_config /content/output.yaml -s /content/new.jpg — det2_weights /content/output/model_final.pth

From there you are going to want to download the converted.onnx. You can do so by right clicking it and clicking on download. You are now going to want to save your configuration file by doing the same thing. Recall that the configuration file is called “output.yaml”. If you didn’t fully understand it, I have my colab linked here for you guys to use.

The following steps are where you are going to need your Jetson device handy. To begin, in order to properly convert the onnx model into a TensorRT engine, you are going to need to upgrade your TensorRT version as per TensorRT Github. This is a relatively simplistic step; however, you are going to want to begin by first checking that you have CUDA and TensorRT installed on your Jetson device. You can do so by running the follow commands in the terminal.

Sudo dpkg –l | grep tensorrt

This should show up with a list of different installations. If nothing shows up, you do not currently have TensorRT installed and will need to debug that yourself. Now to check if CUDA is installed, run the following:

nvcc –V

If this returns an error of some sorts you are also going to need to debug this because it is required to have TensorRT and CUDA installed on your Jetson Device (which should come preinstalled after having flashed the device). Now make sure you have python3.6.9 (or newer) installed.

Next, we are going to need to upgrade the TensorRT OSS. Following a combination of the steps from here and here

In order to install the correct version of TensorRT OSS, we are going to first need to start by upgrading the cmake version. As per TensorRT requirements, we need cmake 3.19.4. However, we currently have cmake==3.10.2. You can run the following commands to build and install cmake==3.19.4

wget https://github.com/Kitware/CMake/releases/download/v3.19.4/cmake-3.19.4.tar.gz 
tar xvf cmake-3.19.4.tar.gz 
cd cmake-3.19.4/ 
mkdir $HOME/install 
./configure — prefix=$HOME/install 
make -j$(nproc) 
sudo make install

Then, we are going to need to clone, build and install the newer version of TensorRT. We’ll start by cloning the repo.

git clone https://github.com/nvidia/TensorRTcd TensorRT/

Now we are going to edit the CMakeList.txt file. There are some things that you will need to change. I will point these out. To begin, on line 77, it sets the architecture to “x86_64”, which if you are using a Jetson device is incorrect. You need to change this to “aarch64”. Next, on line 87 it will set the CUDA version. You will need to change it to be the same as what you got when you did nvcc –V. Depending on what board you are using and what jetpack you are using will be different, but in my case, I was using cuda10.2.3. This should be all you need to edit. However, if in the next couple steps, you notice that you are running into the error of “No such file or directory cub.cuh”, you need to add that directory on line 114 in the include_directories function in the CMakelists.txt.

git submodule update — init — recursive 
export TRT_SOURCE=`pwd` 
cd $TRT_SOURCE 
mkdir -p build && cd build 
$HOME/install/bin/cmake .. -DGPU_ARCHS=”53 62 72" -DTRT_LIB_DIR=/usr/lib/aarch64-linux-gnu/ -DCMAKE_C_COMPILER=/usr/bin/gcc -DTRT_BIN_DIR=`pwd`/out 
make nvinfer_plugin -j$(nproc)

This will have made the inference plugin. Now we are going to need to replace the one you currently have on your system. First, verify that the files exist and go to their directory. After having verified that they exist, you are going to need to find the file location of the current plugins on your device. They will most likely be in the /usr/lib/aarch-linux-gnu/ directory. You can check there. Then to replace the files, you are going to need to run the following commands. Note that these directories worked for me, before you run the command make sure you are sure that you share the same directories as I do.

sudo mv /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.8.x.y ${HOME}/libnvinfer_plugin.so.8.x.y.baksudo cp `pwd`/out/libnvinfer_plugin.so.8.m.n /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.8.x.ysudo ldconfig

This will first create a copy of the file that your host uses in the case that anything goes wrong. Then, it will copy the file you have just created into the directory that you just removed the old file out of. This should have updated your TensorRT version. You should check by running the following command:

sudo dpkg –l | grep libnvinfer

This should show up with a lot of output. Make sure that the libnvinfer_plugin.so.8.x.y is now the newest version that you were trying to install. For example, you can currently install (as of making this) up to 8.4.3. If that is the version you are using, then it should show up with one of them being 8.4.3.

Next, to build the engine you can do it in 2 ways. The way that is suggested is using the installed files (trtexec), however you can do it using the TensorRT samples build_engine.py files as well.

The suggested method (which I used) is using trtexec. The command should look something like this:

/usr/src/tensorrt/bin/trtexec — onnx=/path/to/converted.onnx — saveEngine=/path/to/save/dir/engine.trt — useCudaGraph –-best –-workspace=4000 –noDataTransfers

Note that not all these arguments are necessary, however I found that it caused the engine to convert the quickest and best.

Now you can begin inference using your custom classed Detectron2 model. You can run inference using the /samples/python/detectron2/infer.py file. A sample inference would look something along the lines of this:

python3 infer.py \ 
 — engine /path/to/engine.trt \ 
 — input /path/to/images \ 
 — det2_config /path/to/output.yaml \ 
 — output /path/to/output \

There you go. You have now successfully converted a custom Detectron2 model into a TensorRT engine.

How to optimize custom Detectron2 models with TensorRT

Written by Frankvanpaassen