How to get YOLOv8 Over 1000 fps with Intel GPUs?

Published in

OpenVINO-toolkit

5 min readJan 17, 2023

=) If you have OpenVINO + Intel A770m, you are in!

Running AI inference on GPUs is not a new topic. We have seen many applications using GPUs for AI training and inferencing. However, with the new Intel Arc Graphics, can we do the same? How do we get there?

The short answer is OpenVINO. When I first put my hands on AI (prior to joining Intel), I was trying to run object detections like YOLO with Tensorflow. One thing that really bothered me was the performance of CPUs without any optimization and hardware acceleration. It was just painfully slow, and often I can see the engine was running on a single core (painfully slow right?). And at that time, an expensive graphics card was needed to get meaningful real-time performance. Today, with YOLOv8 and OpenVINO, it really changed the landscape. The CPU, iGPU, and dGPUs can work seamlessly together with the same code base. That’s impressive.

Under the hood, one thing you may want to pick up from this post is the Neural Network Compression Framework (NNCF), and how it compresses and optimizes our model for INT8 and is ready for the underlying hardware accelerations like VNNI, AMX, and XMX.

openvino_notebooks/notebooks/230-yolov8-optimization at main · openvinotoolkit/openvino_notebooks

This tutorial explains how to convert and optimize the YOLOv8 PyTorch* model with OpenVNO. This tutorial demonstrates…

github.com

With the conversion and optimization, the model is ‘lighter’ and faster, with very little accuracy impact.

Single Stream Inference

On my Intel Gen 12th laptop, I was getting over 45fps (with a single 1080p webcam feed). Notice that my machine is not ‘fully’ utilized and I have lots of headrooms for many other inference tasks. Awesome!

How did I run this on GPU? With OpenVINO, the magic was the GPU plugin that allows you switch between devices (device = “GPU”).

run_object_detection(source=0, flip=True, use_popup=False, model=ov_model, device="GPU")

Running YOLOv8 on iGPU with OpenVINO. It’s pretty seamless.

However, if I switch to the A770m, we can see the GPU is barely utilized at 10% load.

The dGPU can run many inferences in parallel as we can see the utilization is relatively low.

How about 1000+ fps?

Here comes the million-dollar question. Where is our promised 1000+fps? The trick is to use multistreams, and multiple inference requests (i.e., running many in parallel) with the throughput mode. With the “benchmark_app” for example, here we can calculate the throughput of the model. Again, you will still need to ensure you fine-tune your pre- and post-processing pipeline to ensure no bottlenecks.

Without further ado, here are the benchmark results from my little NUC from CPU -> iGPU -> dGPU.

# Inference FP32 model (OpenVINO IR)
!benchmark_app -m $model_path -d CPU -api async -shape "[1,3,640,640]"

# Inference INT8 model (OpenVINO IR)
!benchmark_app -m $int8_model_path -d CPU -api async -shape "[1,3,640,640]"

!benchmark_app -m $int8_model_path -d GPU.0 -api async -shape “[1,3,640,640]”

!benchmark_app -m $int8_model_path -d GPU.1 -api async -shape “[1,3,640,640]”

Are you ready?

Woah! A whopping 1073.97 fps with the INT8 model on the Intel Arc A770M Graphics!

That’s over 1000 fps!

If you want to learn more about how we get there, you can walk through the code in our OpenVINO Notebooks, and leave a comment here or there.

Running the Optimization tool, and you can preview the results and learn how to code too! Win-win.

GitHub - openvinotoolkit/openvino_notebooks: 📚 Jupyter notebook tutorials for OpenVINO™

If you wish to launch only one notebook, like the Monodepth notebook, run the command below. In your browser, select a…

github.com

Give it a try and let us know what you get. Happy Coding!

#iamintel

Notices & Disclaimers

Intel technologies may require enabled hardware, software or service activation.
Performance varies by use, configuration and other factors.
Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See configuration disclosure for details. No product or component can be absolutely secure.
Intel optimizations, for Intel compilers or other products, may not optimize to the same degree for non-Intel products.
Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.