How to get YOLOv8 Over 1000 fps with Intel GPUs?

Raymond Lo, PhD
OpenVINO-toolkit
Published in
5 min readJan 17, 2023

=) If you have OpenVINO + Intel A770m, you are in!

Running AI inference on GPUs is not a new topic. We have seen many applications using GPUs for AI training and inferencing. However, with the new Intel Arc Graphics, can we do the same? How do we get there?

The short answer is OpenVINO. When I first put my hands on AI (prior to joining Intel), I was trying to run object detections like YOLO with Tensorflow. One thing that really bothered me was the performance of CPUs without any optimization and hardware acceleration. It was just painfully slow, and often I can see the engine was running on a single core (painfully slow right?). And at that time, an expensive graphics card was needed to get meaningful real-time performance. Today, with YOLOv8 and OpenVINO, it really changed the landscape. The CPU, iGPU, and dGPUs can work seamlessly together with the same code base. That’s impressive.

Under the hood, one thing you may want to pick up from this post is the Neural Network Compression Framework (NNCF), and how it compresses and optimizes our model for INT8 and is ready for the underlying hardware accelerations like VNNI, AMX, and XMX.

With the conversion and optimization, the model is ‘lighter’ and faster, with very little accuracy impact.

Single Stream Inference

On my Intel Gen 12th laptop, I was getting over 45fps (with a single 1080p webcam feed). Notice that my machine is not ‘fully’ utilized and I have lots of headrooms for many other inference tasks. Awesome!

How did I run this on GPU? With OpenVINO, the magic was the GPU plugin that allows you switch between devices (device = “GPU”).

run_object_detection(source=0, flip=True, use_popup=False, model=ov_model, device="GPU")
Running YOLOv8 on iGPU with OpenVINO. It’s pretty seamless.

However, if I switch to the A770m, we can see the GPU is barely utilized at 10% load.

The dGPU can run many inferences in parallel as we can see the utilization is relatively low.

How about 1000+ fps?

Here comes the million-dollar question. Where is our promised 1000+fps? The trick is to use multistreams, and multiple inference requests (i.e., running many in parallel) with the throughput mode. With the “benchmark_app” for example, here we can calculate the throughput of the model. Again, you will still need to ensure you fine-tune your pre- and post-processing pipeline to ensure no bottlenecks.

Without further ado, here are the benchmark results from my little NUC from CPU -> iGPU -> dGPU.

NUC Setup =)
# Inference FP32 model (OpenVINO IR)
!benchmark_app -m $model_path -d CPU -api async -shape "[1,3,640,640]"
~75 FPS with the FP32 model on CPU
# Inference INT8 model (OpenVINO IR)
!benchmark_app -m $int8_model_path -d CPU -api async -shape "[1,3,640,640]"
~183 FPS with the INT8 model on CPU
!benchmark_app -m $int8_model_path -d GPU.0 -api async -shape “[1,3,640,640]”
~204 FPS with the iGPU
!benchmark_app -m $int8_model_path -d GPU.1 -api async -shape “[1,3,640,640]”

Are you ready?

Woah! A whopping 1073.97 fps with the INT8 model on the Intel Arc A770M Graphics!

That’s over 1000 fps!

If you want to learn more about how we get there, you can walk through the code in our OpenVINO Notebooks, and leave a comment here or there.

Running the Optimization tool, and you can preview the results and learn how to code too! Win-win.

Give it a try and let us know what you get. Happy Coding!

#iamintel

Notices & Disclaimers

Intel technologies may require enabled hardware, software or service activation.

Performance varies by use, configuration and other factors.

Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See configuration disclosure for details. No product or component can be absolutely secure.

Intel optimizations, for Intel compilers or other products, may not optimize to the same degree for non-Intel products.

Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.

--

--

Raymond Lo, PhD
OpenVINO-toolkit

@Intel - OpenVINO AI Software Evangelist. ex-Google, ex-Samsung, and ex-Meta (Augmented Reality) executive. Ph.D. in Computer Engineer — U of T.