[yolov8] Batch inference implementation using tensorrt#4 — NMS Post Processing implementation using only Numpy

DeeperAndCheaper
4 min readAug 23, 2023

--

Background

Using the Yolov8 repo, you can use NMS (Non maximum suppression) provided by torch and torchvision. Edge devices like Jetson are often hard to use some packages like torch, torchvision because of package dependency issues. Therefore, in this content, we implement NMS algorithm using Numpy which is the most basic package in python.

Goal

As shown in the figure below, if only Inferencing without NMS like Batch inference #3 , several bounding boxes could be overlapped.

How can I leave only valid bounding boxes among overlapped bounding boxes? The post processing algorithm is called Non Maximum Suppression (NMS).

The process of NMS is introduced in the below.

Since it is a batch unit, we need to figure out the number of batches and prepare bbox, conf, and class_id outputs for each batch (one image).

Since bboxes with confidence smaller than the confidence threshold (conf_thres) do not need to be calculated, they are primarily filtered out.

import numpy as np
def nms(
bbox, # cxcywh
conf,
class_id,
conf_thres,
iou_thres,
keep_topk,
):
conf_cand = conf > conf_thres # candidates
batch_size = bbox.shape[0]

batched_bbox = []
batched_conf = []
batched_class_id = []

for i in range(batch_size):
conf_i_cand = conf[i] > conf_thres
conf_i = conf[i][conf_i_cand]
bbox_i = xywh2xyxy(bbox[i])[conf_i_cand]
class_id_i = class_id[i][conf_i_cand]

For all bboxes, the area is calculated and the conf index is obtained in order from smallest to largest. That is, the one at the end has the largest conf value.

areas = (bbox_i[:, 2] - bbox_i[:, 0] + 1) * (bbox_i[:, 3] - bbox_i[:, 1] + 1)
order = np.argsort(conf_i)

The IoU of the bbox with the largest conf value is compared with the rest of the bboxes, and those greater than the IoU threshold are removed. That is, there are no intersections at all, or only bboxes with a few intersections are left.

while order.size > 0:
index = order[-1]
picked_bbox.append(bbox_i[index])
picked_conf.append(conf_i[index])
picked_class_id.append(class_id_i[index])

x1 = np.maximum(bbox_i[:, 0][index], bbox_i[:, 0][order[:-1]])
x2 = np.minimum(bbox_i[:, 2][index], bbox_i[:, 2][order[:-1]])
y1 = np.maximum(bbox_i[:, 1][index], bbox_i[:, 1][order[:-1]])
y2 = np.minimum(bbox_i[:, 3][index], bbox_i[:, 3][order[:-1]])

w = np.maximum(0.0, x2 - x1 + 1)
h = np.maximum(0.0, y2 - y1 + 1)
intersection = w * h

ratio = intersection / (areas[index] + areas[order[:-1]] - intersection)
left = np.where(ratio < iou_thres)
order = order[left]

Repeat this process until there are no bboxes left.

Then, only the number of keep_topk selected bboxes are left. Then, the final selected bboxes are added for each batch.

batched_bbox.append(picked_bbox[:keep_topk])
batched_conf.append(picked_conf[:keep_topk])
batched_class_id.append(picked_class_id[:keep_topk])

Then you can get the final batched output of bbox, conf, and class_id.

return np.array(batched_bbox), np.array(batched_conf), np.array(batched_class_id)

Conclusion

Since it is implemented by Numpy, it is very fast in terms of speed, so there was no great difficulty in using it. If you want to use a more efficient NMS algorithm in the Tensorrt engine, it is recommended to use the EfficientNMSDynamic_TRT plugin. refer to this post.

Appendix (from yolov8 repo)

def xywh2xyxy(x):
"""
Convert bounding box coordinates from (x, y, width, height) format to (x1, y1, x2, y2) format where (x1, y1) is the
top-left corner and (x2, y2) is the bottom-right corner.Args:
x (np.ndarray): The input bounding box coordinates in (x, y, width, height) format.
Returns:
y (np.ndarray): The bounding box coordinates in (x1, y1, x2, y2) format.
"""
y = np.empty_like(x)
dw = x[..., 2] / 2 # half-width
dh = x[..., 3] / 2 # half-height
y[..., 0] = x[..., 0] - dw # top left x
y[..., 1] = x[..., 1] - dh # top left y
y[..., 2] = x[..., 0] + dw # bottom right x
y[..., 3] = x[..., 1] + dh # bottom right y

return y

Trending Articles

Hit! [yolov8] converting to Batch model engine

Hit! [Quantization] Go Faster with ReLU!

[Quantization] Achieve Accuracy Drop to Near Zero

[Quantization] How to achieve the best QAT performance

[Yolov8/Jetson/Deepstream] Benchmark test

[yolov8] NMS Post Processing implementation using only Numpy

[yolov8] batch inference using TensorRT python api

About Authors

Hello, I’m Deeper&Cheaper.

  • I am a developer and blogger with the goal of integrating AI technology into the lives of everyone, pursuing the mission of “Make More People Use AI.” As the founder of the startup Deeper&Cheaper, operating under the slogan “Go Deeper Make Cheaper,” I am dedicated to exploring AI technology more deeply and presenting ways to use it cost-effectively.
  • The name encapsulates the philosophy that “Cheaper” reflects a focus on affordability to make AI accessible to everyone. However, from my perspective, performance is equally crucial, and thus “Deeper” signifies a passion for delving deep with high performance. Under this philosophy, I have accumulated over three years of experience in various AI fields.
  • With expertise in Computer Vision and Software Development, I possess knowledge and skills in diverse computer vision technologies such as object detection, object tracking, pose estimation, object segmentation, and segment anything. Additionally, I have specialized knowledge in software development and embedded systems.
  • Please don’t hesitate to drop your questions in the comments section.

--

--