Intel OpenVINO with OpenCV

AI Practitioner and Writer
sclable
Published in
14 min readOct 16, 2020

A guide to speed up inferences of trained Deep Learning models across Intel hardware.

Open Vino, background photo by Kelly Lacy on Pexels

Introduction

Training deep learning models is compute-intensive and it is performed either in the cloud or on a workstation machine, both of which are equipped with high-end GPUs and CPUs. On the other hand, the execution of these trained models does not necessarily require such expensive solutions and quite often are required to run in mobile or edge devices which might be equipped with low-end CPUs/GPUs/FPGAs or even with external USB neural network accelerators like the Intel Movidius Neural Stick 2 (NCS 2).

In this article, we will go through a toolkit provided by Intel called OpenVINO and see how it can be used to perform model inference efficiently across various intel platforms ranging from clouds/servers to edge devices. We shall also see how it can be used along with OpenCV, especially with its Deep Neural Network (DNN) module. Several examples of image classification, object detection and image segmentation are provided in this Github repository.

What is OpenVINO?

OpenVINO (Open Visual Inference and Neural Network Optimization) is a toolkit which allows to run DL models across various Intel specific hardware devices like Intel CPUs (Xeon, Core and Atom), Intel Integrated GPUs (HD Graphics and Iris), VPUs (Movidius Neural Compute Stick 2), Intel FPGAs (Vision Accelerator and Programmable Acceleration Card) etc., with just a few lines of codes.

It efficiently uses these Intel hardware devices for model inference and it does so by using optimized implementation (provided by Intel) of the libraries that directly communicate and execute models on hardware while hiding all the details from the user.

Principles

OpenVINO focuses primarily on three things:

  1. Portability — it supports running of trained models across different Intel platforms such as CPUs, Integrated Graphics GPU, FPGA, NCS 2 etc.
  2. Interoperability — various models trained on different DL frameworks (such as Tensorflow, Caffe, PyTorch etc) can be converted into their intermediate representations and they can then be used by an inference engine later on. It converts different formats into a single unified representation.
  3. Model Optimization — models can be further optimized to run faster.

Specifically, OpenVINO is designed to optimize the execution of Convolutional Neural Networks (CNNs), thus, it supports the deployment of Computer Vision (CV) oriented solutions anywhere from clouds/servers to edge devices, as long as they are running on Intel devices.

Why using OpenVINO?

In simple terms, you might want to use OpenVINO if:

  • You want to get the best performance of your model execution on any of the Intel hardware.
  • You wish to run several models on various devices at the same time (e.g., one model running on CPU while another running on GPU etc).
  • You seek to run your model in a heterogeneous manner (e.g. a model is divided into some parts and those parts are being run across different devices).
  • You would like to run a model inference in an asynchronous mode (e.g. video or camera stream processing).

Other advantages:

  • It enables you to run the model on Intel Movidius Neural Compute Stick 2 (NCS 2) which is basically a USB neural-network accelerator (See figure 2). Its primary use is in Edge Computing when you do not have enough computational power in your main device and you want to use an external device for this purpose.
  • Your computer, laptop or standalone server is most likely equipped with Intel GPU which you might not be using. OpenVINO enables it for model inference, therefore saving (or freeing up) your CPU for some other tasks. Thus, you can use your hardware more efficiently.

Figure 1 provides an overall workflow of model deployment with OpenVINO. Two most important components of OpenVINO are (i) Model Optimizer (MO) and (ii) Inference engine (IE). Both of them are described in the following subsections in detail. The primary functionality of MO is to convert and optimize existing models while IE is responsible for running the models in user-provided target devices.

Fig.1: Deployment of a DL trained model with OpenVINO. Source: Model Optimizer Developer Guide.
Fig. 2: Intel Movidius Neural Compute Stick 2. Source: Intel NCS 2.

OpenVINO Workflow

The overall workflow of OpenVINO can be summarized in the following steps:

  1. Configure the working environment for the model optimizer [link]: install prerequisites of the DL frameworks (Tensorflow or PyTorch or Caffe etc) that your model was trained on.
  2. Run the model optimizer to convert and optimize the trained model into Intermediate Representation (IR) format [link]: This step is hardware agnostic and until this point, we do not need to execute/run the model (offline phase). IR format consists of two files:

.bin — weights and biases (i.e. parameters) of a trained model in binary format.

.xml — network architecture (i.e. topology) in XML format.

Note: OpenVINO does not work directly with Keras model (.h5), however, you can always convert Keras model into Tensorflow frozen graph first (if TF < 2.0). Check out this stackoverflow post for the conversion. While for TF>=2.0 versions, load the .h5 model and save it back again using saved_model API.

3. Once we get the model in IR format, the Inference Engine (IE) will load and run the model inference on the target platform (online phase).

Model Optimizer (MO)

The necessity for MO and IE comes from the fact that there are various DL frameworks (e.g, Tensorflow, PyTorch, Caffe, MXNet etc) out there and there is a wide range of Intel provided hardware. So, directly mapping a model trained on these DL frameworks to any of these devices/hardware is not straightforward.

In fact, it is a quite complex task because of the following reasons:

  • Representations (or, network topologies) of the models are totally different from one framework to another.
  • Each of these devices has different instruction sets and use different programming environments to work on. For example, implementation of the same operations of some layers is different in CPU (it uses Intel MKL or OpenBLAS) and in GPU (it uses cuDNN/CUDA).

So, Intel has built a common API (which is OpenVINO) through which we are able to run a model across these devices without knowing the hardware details and the underlying libraries/plugins being used to access them.

As mentioned before, MO is used in the first offline phase and it is hardware agnostic. MO primarily performs the following three tasks:

  1. Conversion of a DL model (either .pb, .onnx or .caffemodel etc) to Intermediate Representation (IR) format, which includes .xml file (which contains the model’s network topology) and .bin file (which contains the weights and biases, i.e., parameters of the model).
  2. Optimization of the model (e.g., fusion of the layers, it helps to save computation and memory usage). For example, a batch-normalization layer can be easily fused with the previous convolutional/fully-connected layer to become a single layer. It will then execute both of these operations in a single pass, hence, saving computational cost and storage of intermediate results. This optimization is turned on by default.
  3. Quantization of model parameters to various precision formats (FP32, FP16 and INT8). For example, CPU works better with FP32 while GPU works fast in FP16 whereas NCS 2 does not operate in FP32. It always depends on the kind of device which you would like to perform inference on. Therefore, it is important to check the requirements of the device before the conversion of the weights. Keep in mind that not all the layers are supported by every device. Please refer to this link for more details, e.g. Activations Selu or Softplus are not supported by NCS 2. Table 1 provides details for the supported precisions, e.g. NCS 2 (VPU plugins) works only on FP 16 precision.
Table 1: Model formats across various devices. Source: Supported devices.

Inference Engine (IE)

The Inference Engine (IE) is a set of C++ libraries providing a common and unified API which lets the user perform inference on the device of their desired choice, e.g., CPU, GPU, FPGA, VPU etc. It provides an API to read the IR files (.bin and .xml) generated by the MO, to set the inputs and outputs and to execute the model on devices. In addition to its primary C++ implementation, there are also Python bindings available.

Intel provides a wide range of computing devices and in order to work and operate with these devices, ideally, we need to learn/study the required libraries beforehand. This can be quite a complex and overwhelming task since there exist many of those libraries and all of them are quite different in nature. For example, in order to work and operate on a GPU, one should know OpenCL or CUDA and likewise for other devices. Thanks to OpenVINO, this complexity is taken away from us. IE abstracts out all of these hardware and software details for us since it is built on the top of all of these libraries and for end-users, it provides a common API which is quite easy to work with.

Fig. 3: Inference Engine Architecture. Source: OpenVINO development guide.

As can be seen from figure 3 that IE is based on a plugin architecture. So, IE chooses the right plugins for the supported devices. Each plugin then uses the required corresponding optimized libraries under the hood in order to communicate and perform computations on those devices. For example CPU plugin uses Intel MKL-DNN library (Intel Math Kernel Library), GPU plugin uses clDNN library (Intel Compute Library OpenCL), VPU plugin uses MYRIAD APIs (specifically for NCS 2) etc.

In the end, all the user needs to do is to simply select the targeted device as a backend where the model inference should be performed and the rest will be taken care by IE. IE will choose the correct plugin and will execute the model on it.

Note that there are two more topics worth mentioning related to the inference: heterogeneous plugin and asynchronous execution.

Heterogeneous plugin

Heterogeneous plugin is a part of the IE (See figure 4) and sits on the top of all the plugins and enables us to utilize all the available hardware at the same time for a single inference. It allows us to break down a network into different parts and assign them specific hardware to execute them. For example, a specific computationally expensive layer (let’s say CONV layer) can be assigned to be executed in GPU (or, NCS 2 or any kind of accelerator) and other layers (let’s say non-linear ones or those which are not supported by GPU/NCS2) can be executed in CPU as a fallback option. Another example can be simply to use all the available devices for a single inference.

Fig.4: Inference using a heterogeneous plugin. Source: Introduction to OpenVINO.

There are two independent ways of executing inference in heterogeneous mode:

  • Setting affinities to layers, i.e., assigning which layer should be executed in which device, e.g., network.getLayerByName(“CONV”)->affinity=“GPU”
  • Setting the whole network on various devices according to a priority, e.g., dispatcher.getPluginByDevice(“HETERO: FPGA, GPU, CPU”). It means a layer is going to be executed first in FPGA and if its implementation for that layer (let’s say RELU6) is not supported in that device, then, it automatically goes to look for GPU and so on.

Asynchronous execution

With the Async API (i.e., asynchronous mode of execution), you can easily perform two tasks at the same time instead of waiting for another job/task to be finished in a sequential manner (i.e, synchronous execution). It is especially useful in video processing, e.g., instead of waiting for inference on the current frame to be completed, let’s say on GPU or NCS 2, one can continue to do other things, e.g., preparing the next frame (either preprocessing or decoding) on CPU. Thus, it provides parallel processing capabilities in the application. Here is a link to a Python example.

It is important to note that Async mode should not be run on the same device because the device is already busy with a current job and assigning another job to it is not good for the parallel performance.

OpenVINO with OpenCV

OpenCV does not need any introduction as it is the most important library for computer vision. Recently, it marked its 20th anniversary (it was created in June 2000). OpenCV provides several very useful modules for working in the computer vision domain and one of them is called the Deep Neural Network (DNN) module.

DNN module implements forward pass (i.e., model inference) of deep networks and thus, it can enable us to run prediction with just OpenCV. It supports various popular deep learning frameworks including Tensorflow, PyTorch and Caffe. Thus, it can load and run models in their native formats (.pb, .onnx or .caffemodel) and also, in intermediate representations (IR) formats (produced by the model optimizer).

There are lots of advantages of performing inference in OpenCV only:

  • No need to install any heavy DL training framework in the system for production use.
  • DNN module code is quite simple, clean and compact.
  • Functionalities provided are highly optimized (AVX and NEON instruction sets accelerated).
  • It is widely supported across various devices including resource-constrained mobile (e.g., Android) and edge devices (e.g., Raspberry Pi).
  • Empirically, it has been found by Satya Mallick’s study that inference using OpenCV’s DNN module is faster than any of the other DL frameworks. He compared the CPU’s performance on all of them.

In addition, OpenCV provides a possibility of executing trained models in devices like CPU (default), NVIDIA GPU or Intel GPU. However, for using NVIDIA GPU or Intel GPU as a target backend, one needs to compile and build OpenCV from scratch with CUDA and OpenCL options enabled, respectively.

So, instead of building OpenCV with OpenCL, OpenVINO eases this process because, by default, its installation comes along with OpenCL. It also further enhances OpenCV’s capabilities by making it run on Intel Movidius NCS 2.

Note: for using OpenCV’s DNN module exclusively on NVIDIA GPUs, follow this installation guide for compiling and building OpenCV with CUDA.

Some of the useful OpenCV’s DNN functions/methods which we are going to use in this article are given below. For full more details, please refer to this link.

  • blobFromImage() — prepare/preprocessing an input image (e.g., image resizing, mean subtraction, scaling, crop image from center and swapping R & B channels).
  • blobFromImages() — same as blobFromImage() but on a batch of images.
  • forward() — runs forward/inference pass to compute output on the input blob. The output format/array will depend on the kind of task.
  • setInput() — set the input blob to the network for inference.
  • setPreferableBackend() — set a specific computation backend to use. If using OpenVINO or OpenCV compiled with the inference engine, then, the default backend is DNN_BACKEND_INFERENCE_ENGINE otherwise it is DNN_BACKEND_OPENCV.
  • setPreferableTarget() — performs computations on specific target devices, e.g., DNN_TARGET_CPU, DNN_TARGET_OPENCL (for Intel GPU FP32), DNN_TARGET_OPENCL_FP16 (for Intel GPU FP16, preferred), DNN_TARGET_MYRIAD (for NCS 2), DNN_TARGET_FPGA, DNN_TARGET_CUDA (for NVIDIA GPU FP 32) and DNN_TARGET_CUDA_FP16 (for NVIDIA GPU FP16, preferred).
  • readNet() — read a model in any of the supported formats.
  • readNetFromModelOptimizer() — reads a model stored in IR format (.bin and .xml).
  • readNetFromTensorflow() — reads a model stored in TF format (.pb and .pbtxt). Similarly, readNetFromCaffe(), readNetFromDarkNet(), readNetFromTorch() and readNetFromONNX().

Installation

For this article, sample codes have been developed and tested in macOS Mojave (10.14.6), Raspberry Pi 4, NCS 2 and with Python (3.6) bindings of OpenVINO. OpenVINO 2020.1.023 (which has OpenCV 4.2.0) version is used, although a newer version (2021.1) exists now.

Please set up a Python virtual environment where all the required packages will be installed. For OpenVINO installations, refer to the following documentation for the guidelines. Here are the links:

Regarding the NCS 2 installation, also refer to the links given above. There are separate sub-sections describing its installation on both macOS and Raspberry Pi. Installation of OpenVINO will include following packages: (i) Model Optimizer; (ii) Inference Engine; (iii) Optimized OpenCV; (iv) OpenCL; (v) Demo / sample codes; (vi) Intel Media SDK (optional, alternatively one can use FFmpeg library) and (vii) Documentations.

Notes: Unfortunately, OpenVINO for macOS does not support inference on Intel GPU! It only supports inference on Intel CPUs and Intel NCS 2. And,

Please make sure to run a sample application provided in the installation guide in order to verify if the installation is done correctly.

Example

  • OpenVINO provides two kinds of model repositories: (i) Intel pre-trained models and (ii) publicly pre-trained models. Models which are included in an Intel-based repository are already converted into IR formats and they have also been quantized (mostly) into FP32, FP16 and INT8 while the publicly pre-trained repository contains models in their native formats (Tensorflow, Caffe etc).
  • It is important to first look into the documentation of a model before downloading, especially for the publicly trained models. For example, all the preprocessing details will be required (i.e., input image size, RGB/BGR order, mean value to subtract, scale to divide etc) for the preparation of an input image. For the Intel-based models which are already in their IR formats, these preprocessing operations are appended at the beginning of the network layer which you can check by opening their XML file.
  • Activate your virtual environment and initialize all the environmental variables required by running this command:

source /opt /intel/openvino/bin/setupvars.sh

It should output this message: [setupvars.sh] OpenVINO environment initialized

  • Optionally, you can put the source command in the bash_profile so that you do not need to run it every time.
  • To see all the models (Intel and publicly available ones) which can be downloaded, run the downloader script as:

$ cd /opt/intel/openvino/deployment_tools/tools/model_downloader

$ python3 downloader.py — print_all

  • Now, let’s download a face detection pre-trained model (“face-detection-retail-0004”) which is already converted into IR format and with a precision of FP16 so that we can run this model either in Intel GPU or NCS 2. If you want to infer it on the CPU, then, it’s better to download it in FP32. Although FP16 also works in CPU, the inference engine will most likely need to typecast the values of the weights from FP16 to FP32 all the time.

$ python3 downloader.py — name face-detection-retail-0004 — precisions FP16 — output_dir PROVIDE_DIR_WHERE_TO_DOWNLOAD

  • Below is the relevant code snippet to run face detection model on NCS 2 with a webcam. It has been also tested in Raspberry pi 4:

There are also other examples (classification, detection and segmentation) included in the Github repository. Following is the list of codes with a short description:

  1. Model downloader — to download models from OpenVINO model zoo.
  2. Model conversion — to convert the model from their native format to IR format.
  3. Image classification using MobileNetV2 model.
  4. Image classification using Inception ResNet V2 model.
  5. Face detection using a webcam and model inference performed on NCS 2.
  6. General object detection using MobileNetV2 and SSD based model.
  7. Road segmentation using semantic segmentation model.
  8. Custom pavement cracks segmentation model based on the UNet model.

Conclusion

Deep learning-based models can be easily trained on NVIDIA GPUs since there is a vast availability of popular frameworks which supports them. However, executing them in the most performant way and moreover, in resource-constrained devices is not that straightforward. Since Intel owns and provides vast majorities of various computing devices, there was an obvious need to provide something which covers all of their devices for the inference purpose. Thus, they created OpenVINO which provides a common and unified API to be able to access all of them at a very high level.

In this article, we have gone through OpenVINO and also, we have covered the model optimizer and inference engine which are the two most important components of it. One of the most convenient parts of OpenVINO is that it comes with OpenCV which is already compiled and built to support Intel GPU and Intel NCS 2. We have seen that in OpenCV it takes only one line of code to select a preferred target device to run the model inference on.

Github link for various use-cases is provided above, feel free to play around with it.

This article was written for Sclable’s blog on Medium.
If you liked it, give it a clap and share if you ❤️

--

--

AI Practitioner and Writer
sclable
Writer for

Passionate about Computer Vision, Image Processing, Machine Learning, Deep Learning, Edge Computing and Data Science.