Tutorial on How to Run Inference with OpenVino in 2021

Published in

Analytics Vidhya

5 min readNov 1, 2020

This article is intended to provide insight on how to run inference with an Object Detector using the Python API of OpenVino Inference Engine. On my quest to learn about OpenVino and how to use it, I found some code examples and tutorials online that were helpful. However, most of the code I found uses deprecated methods and I struggled to find up-to-date and minimal inference examples. That is the purpose of this article, to provide readers with minimal code using the most up-to-date OpenVino package

This article is broken up into the following sections, feel free to skip ahead:

Brief Overview of OpenVino
Installation Instruction for Linux
Tutorial Code for Running Inference

Overview of OpenVino

Intel OpenVino Toolkit Overview | Image Credit: Intel

OpenVino is a toolkit developed by Intel for Intel hardware. The toolkit was originally developed as an API solution for Visual Inference and Neural Optimization (VINO), however, it has evolved to include support for models such as BERT for NLP tasks. The goal of the toolkit is to act as an inference accelerator by applying a variety of optimization techniques and as an inference engine interface. The API solution Intel provides can be separated into two primary components:

Model Optimizer
Inference Engine

The Model Optimizer is the first step to running inference. It involves converting a set of model weights and a model graph from your native training framework (TensorFlow, PyTorch, MXNet, ONNX) and converting it to a .xml file which describes the network topology and a .bin file which stores the network weights.

The Inference Engine is the second and final step to running inference. It is a highly-usable interface for loading the .xml and .bin files created by the model optimizer and running inference on your data. The Inference Engine is what this article is written about. If you are looking for information on the Model Optimizer, check out the hyperlinks or keep your eyes peeled for more content from this account.

Installation Instructions for Linux

We will be using the newest release of OpenVino which was released on October 14th, 2020. It is the 2021 V1 package. Something especially great about this release is that it can be installed through pip! Previous versions required registration and installing from source which can add unnecessary weight to the initial learning curve. The installation requires a single pip command:

pip install openvino-python

For specifics on operating system compatibility, here is a link to the pip project.

Tutorial Code for Running Inference

First we have to download an example .xml and .bin file. If you have your own .xml and .bin files feel free to skip this step. If not, lets download a pre-trained object detection model from Intel. We can do this using wget

wget https://download.01.org/opencv/2021/openvinotoolkit/2021.1/open_model_zoo/models_bin/1/faster-rcnn-resnet101-coco-sparse-60-0001/FP32/faster-rcnn-resnet101-coco-sparse-60-0001.xmlwget https://download.01.org/opencv/2021/openvinotoolkit/2021.1/open_model_zoo/models_bin/1/faster-rcnn-resnet101-coco-sparse-60-0001/FP32/faster-rcnn-resnet101-coco-sparse-60-0001.bin

Now you should have two files downloaded. The .xml file is the network structure or topology. The .bin file is the model weights, which is why it took significantly longer to download. With the model structure and weights downloaded, we can get started on running inference.

First lets import the libraries we will use.

from openvino.inference_engine import IECore, Blob, TensorDesc
import numpy as np

IECore is the class that handles all the important back-end functionality. Blob is the class used to hold input and output data. TensorDesc is used to describe input data features such as bit precision and tensor shape.

Now lets set some file paths.

XML_PATH = "PATH_TO_XML_FILE"
BIN_PATH = "PATH_TO_BIN_FILE"

Adjust their values to your xml and bin file paths. These will be used later. Next is loading the IECore class.

ie_core_handler = IECore()

The IECore is what handles all the back-end functionality. Check out the Python API here. Now, lets load the network.

network = ie_core_handler.read_network(model=XML_PATH, weights=BIN_PATH)

read_network loads the model structure (.xml file) and the model weights (.bin file) into the network variable. With model information loaded, we can build the executable network.

executable_network = ie_core_handler.load_network(network, device_name='CPU', num_requests=1)

load_network takes in the network information, builds an executable network on the specified device (in this case the CPU) and returns a set of inference requests. We set num_requests to 1 and use synchronous execution for simplicity. Keep your eyes open for future articles about using multiple inference requests with asynchronous execution.

With an executable_network built, we can access the inference requests. Since we set the num_requests to 1, we reference index 0 and get the inference request from the executable network.

inference_request = executable_network.requests[0]

Check out the Python API for inference requests here. Now, we can build some dummy input data and prepare it for input to the network.

random_input_data = np.random.randn(1, 3, 800, 1280).astype(np.float32)tensor_description = TensorDesc(precision="FP32", dims=(1, 3, 800, 1280), layout='NCHW')input_blob = Blob(tensor_description, random_input_data)

In these three lines we set random_input_data to random data which will be used as input data to the inference_request. This can be replaced with an actual image, however, be sure to resize it to the proper input size! I use the input size as defined for the model here. Then we set the tensor_description which specifies bit precision, dimensions and channel layout. This is the Python API to TensorDesc class. Lastly, we build an input_blob from the Blob class. The Blob class is what OpenVino uses as its input layer and output layer data type. Here is the Python API to the Blob class.

Now we need to place the input_blob in the input_layer of the inference request. To do this, we use inference_request.set_blob() method. However, we need the name of the input_layer first. To find the name we can print all input layer names.

print(inference_request.input_blobs)

input_blobs is a dictionary that maps input layer names to corresponding blobs. Instead of printing the dictionary we can capture the input layer name by getting the first key in the input_blobs like this:

input_blob_name = next(iter(input_blobs))

With an inference_request available and the corresponding input_blob_name, we can set the input_blob in the inference request.

inference_request.set_blob(blob_name=input_blob_name, blob=input_blob)

With the input_blob set, we can finally run inference!

inference_request.infer()

To get the output_blob after running inference, we also need the name of the output_blob. Lets get it using the same method above:

output_blob_name = next(iter(inference_request.output_blobs))

With the output_blob_name, we can get the output from the inference_request.

output = inference_request.output_blobs[output_blob_name].buffer

The .buffer attribute is the data from the output_blob with output_blob_name.

That concludes the Tutorial for Running Inference with OpenVino v2021. Expect to see some more related content coming from this account, so please give a follow if you found it interesting!

Tutorial on How to Run Inference with OpenVino in 2021

Written by Daniel Merrick