Performance Benchmarking of YOLOv7 TensorRT from Cloud GPUs to Edge GPUs

Taka Wang
Nirvana Rebirth
Published in
2 min readJul 25, 2022
YOLOv7 TensorRT Performance Benchmarking.

Object detection is one of the fundamental problems of computer vision. Instead of region detection and object classification separately in two stage detectors, object classification and bounding-box regression are done directly without using pre-generated region proposals in one stage detectors. YOLO (You Only Look Once) is one of the representative models of one-stage architecture. The YOLO family has continued to evolve since 2016, this summer we’ve got its latest update to version 7.

If you are trying to learn how to train your model on a custom dataset from the beginning, there are already many tutorials, notebooks and videos available online. In Nilvana, we really care about its real-world performance on the embedded devices, especially Nvidia Jetson family devices. So we conducted a series performance testing of YOLOv7 variants models on different devices, from cloud GPUs A100 to the latest tiny powerhouse AGX Orin.

The main reason YOLOv7 is more accurate, compare to other models with similar AP, YOLOv7 has only about half computational cost. — WongKinYiu

Input and Output shape of YOLOv7 (80 class)

According to the results table, Xavier NX can run YOLOv7-tiny model pretty well. AGX Orin can even run YOLOv7x model more than 30 FPS, it’s amazing!

End-to-End Performance on 1080P video, Batch Size=1
Performance Benchmarking Playlist

--

--

Taka Wang
Nirvana Rebirth

AI Practitioner, Ph.D in CS, Co-Founder of @Nilvana, a husband. Skilled in IoT & AI tech. Open to work. Passionate about creating value for businesses.