Yolov5 inferencing on ONNXRuntime and OpenCV DNN.

4 min readDec 26, 2022

Let’s explore the yolov5 model inference.

While searching for a method to deploy an object detection model on a CPU, I encountered the ONNX format. ONNX is an Open Neural Network Exchange, a uniform model representation format. It enables a model trained in any framework to deploy on any deployment target. The ONNX graph shows the step-by-step transformation of a feature to get a prediction. The ONNX models are optimized for any deployment targets.

First, we need to export the yolov5 PyTorch model to ONNX. The Netron app is used to visualize the ONNX model graph, input and output nodes, their names, and sizes.

Yolov5s ONNX model graph visualization in Netron app.

To load and run the ONNX model, OpenCV DNN and ONNXRuntime modules are used.

ONNXRuntime and OpenCV DNN module

The ONNXRuntime is a cross-platform model accelerator. It performs provider-independent optimizations and partitions model graphs into subgraphs that are executed by execution providers on hardware using execution provider libraries that are preinstalled in the execution environment.
OpenCV DNN enables deep learning inferencing using OpenCV. It can load different models from different frameworks. It is highly optimized for CPU.

YOLOv5 Inference

To perform inferencing, the yolov5s model exported to ONNX is used.

With ONNXRuntime

ONNX models are consists of a graph of computation and operators. They are optimized for different hardware targets. These operators are executed using execution providers that are specific to the execution target ( CPU, GPU, IoT, etc.). The execution providers are configured using parameter providers.

In the example below if there is a kernel in the CUDA execution provider ONNX Runtime executes that on GPU. If not the kernel is executed on the CPU.

providers = ['CUDAExecutionProvider', 'CPUExecutionProvider'] if ort.get_device()=='GPU' else ['CPUExecutionProvider']
session = ort.InferenceSession('yolov5s.onnx', providers=providers)

2. The .get_outputs() and .get_inputs() methods are used to obtain nodes’ input and output metadata.

outname = [i.name for i in session.get_outputs()] 
inname = [i.name for i in session.get_inputs()]

3. To compute prediction run the ONNX model. The function takes the name of outputs and dictionary for input { input_name: input_value }. The output is a list of results where the result is either an array or tensor.

inp = {inname[0]:im}
outputs = session.run(outname, inp)

4. To avoid multiple overlapping bounding boxes non-maximum suppression is implemented. Based on confidence and IOU threshold values the bounding boxes are filtered out.

output= torch.from_numpy(outputs)
out = non_max_suppression(output, conf_thres=0.7, iou_thres=0.5)

5. Draw bounding boxes on the image for final detections.

for i,(x0,y0,x1,y1,score,cls_id) in enumerate(out):
      box = np.array([x0,y0,x1,y1])
      box -= np.array(dwdh*2)
      box /= ratio
      box = box.round().astype(np.int32).tolist()
      cls_id = int(cls_id)
      score = round(float(score),3)
      name = names[cls_id]
      color = colors[name]
      name += ' '+str(score)
      cv2.rectangle(img,box[:2],box[2:],color,2)
      cv2.putText(img,name,(box[0], box[1] - 2),cv2.FONT_HERSHEY_SIMPLEX,0.75,[225, 255, 255],thickness=2)

With OpenCV DNN

To load the ONNX model.

net = cv2.dnn.readNetFromONNX('yolov5s.onnx')

2. Convert the image to a blob and set it as an input to the network. The function getUnconnectedOutLayersNames() gives names of output layers through which the image forward propagates to detections.

blob = cv2.dnn.blobFromImage(img, 1/255 , (640, 640), swapRB=True, mean=(0,0,0), crop= False)
net.setInput(blob)
outputs= net.forward(net.getUnconnectedOutLayersNames())

3. Loop through detections and filter out good detections based on confidence threshold and class score values. Each detection contains x, y, w, h, confidence score, and class score values depending on the number of classes.

for i in range(n_detections):
  detect=out[0][i]
  confidence= detect[4]
  if confidence >= conf_threshold:
    class_score= detect[5:]
    class_id= np.argmax(class_score)
    if (class_score[class_id]> score_threshold):
      score.append(confidence)
      class_ids.append(class_id)
      x, y, w, h = detect[0], detect[1], detect[2], detect[3]
      left= int((x - w/2)* x_scale )
      top= int((y - h/2)*y_scale)
      width = int(w * x_scale)
      height = int( y*y_scale)
      box= np.array([left, top, width, height])
      boxes.append(box)

4. Apply non-max suppression to remove multiple overlapping detections.

indices = cv2.dnn.NMSBoxes(boxes, np.array(score), conf_threshold, nms_threshold)

5. Draw bounding boxes on the image based on final detections.

for i in indices:
    box = boxes[i]
    left = box[0]
    top = box[1]
    width = box[2]
    height = box[3] 
    cv2.rectangle(img, (left, top), (left + width, top + height), (0, 0, 255), 3)
    label = "{}:{:.2f}".format(classes[class_ids[i]], score[i])
    text_size = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.7, 1)
    dim, baseline = text_size[0], text_size[1]
    cv2.rectangle(img, (left, top), (left + dim[0], top + dim[1] + baseline), (0,0,0), cv2.FILLED)
    cv2.putText(img, label, (left, top + dim[1]), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 0), 1, cv2.LINE_AA)

This will help you to deploy your custom yolov5 models on the CPU. Check out the complete code on my GitHub.

References

Previous Stories,

Happy Learning!!!

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai

medium.com