Object recognition with Intel® Distribution of OpenVINO™ toolkit

Avirup Basu
Intel Software Innovators
5 min readMay 22, 2019

Computer vision may be computationally intensive and at times, we don’t have the available resources to perform it. In most of the cases, we end up with a system which is either an extremely laggy system or an inaccurate system. With deep learning on the rise, we do go for several pre-trained models for our tasks. Normally, in majority cases, the most challenging part is to optimize the system such that all available resources can be utilized efficiently. In this article, we are going to do a complete walkthrough about how object recognition can be achieved using the Intel® Distribution of OpenVINO™ toolkit. I am not going to cover as how to train the model or the theoretical aspect of the model. What this article target is the usage of a particular model for inferencing.

In this case, we are approaching the problem with the use of the Single Shot Multibox Detector (SSD) with MobileNets.

Prerequisites:

  1. Intel® Distribution of OpenVINO™ toolkit is installed on your PC. For details, visit here
  2. The initial model is optimized. In this case, you can refer to the first part of the smart video workshop’s repo.

Architecture:

With the use of Intel® Distribution of OpenVINO™ toolkit, the overall process can be divided into two broad categories.

  1. Using the model optimizer to generate .xml and .bin files
  2. Use the inference engine to inference the models

Our first task is to get a hold on the model file. Intel® Distribution of OpenVINO™ toolkit comes up with a model zoo which essentially contains pre-trained models. Under the prerequisites section, refer to point (2) on how the model file can be optimized and downloaded.

Note: This article won’t cover how to optimize the model using the model optimizer. That part will be covered separately.

Intel® Distribution of OpenVINO™ toolkit architecture

Initially we have a model file which can be based on Tensorflow or Caffe or any other framework. OpenVINO will create two files (.xml and .bin) after running the model optimizer. This is also known as the intermediate representation (IR). Now, we have an inference engine which is the main topic for this article, it will take the input frames, process it and then give us the output.

The overall architecture is shown below

Overall architecture

Now that we have all the necessary files ready, we will move forward with the creation of the python script which will do the inferencing for us based using the intermediate representation.

Intel® Distribution of OpenVINO™ toolkit provides us APIs for python for interacting with the IR layer. The output is normally a 2D array which is then used to superimpose the frames and ultimately visualize it.

Inference Engine implementation

All the code that are mentioned here are available in this repo.

The inference engine is a set of classes to infer input data which are images. The classes provide us an API to read the IR, set the input and output formats and ultimately execute the IR to get the output. It is mainly written in C++, however, it has Python wrappers, which allows us to code implement the IR in our application.

The standard flow is mentioned below.

  1. Read the IR
  2. Prepare the output and input format
  3. Selecting the plugin for the device selected
  4. Load the network on the device and mention other essential configurational parameters
  5. Set the input data
  6. Execute the network either synchronously or asynchronously
  7. Get the output

We will go step by step for the code. I have two files for the inference. The first is detect.py where we do the main computational stuff. Next is main.py, where we import detect.py and accept all the necessary parameters.

Firstly, we will work on detect.py

From the above code note the OpenVINO dependencies, namely IENetwork and IEPlugin.

IENetwork class contains the information about the network model read from IR and allows you to manipulate with some model parameters such as layers affinity and output layers, while IEPlugin is the main plugin interface. It initializes and configures the plugin.

More details about the API information is available here

Our code consists of two user-defined classes. The first named Detectors is responsible for loading the network to the plugin and performing the initial configuration. The second named Processor is responsible for executing the network and finally returning the output.

The above code is for detectors class. Initially, we simply declare the constructor to get all the parameters. In this case, we will require the following.

  1. The device where inferencing needs to be done
  2. The path of the model file (.xml)
  3. The path to the CPU extension
  4. The plugin directory
  5. A flag to execute synchronously or asynchronously

Under, the method initialise_inference, we initially, use the IEPlugin to initialize the plugin for the device. Then we use the IENetwork class to initialize the network. As a parameter, we pass the .xml and .bin files.

Finally, the plugin is loaded using plugin.load() where the IENetwork object is passed along with num_request, which is essentially the maximum number of requests that can be executed in parallel and is dependent on the hardware. In this case, we use 2. At the end we declare the input and output configuration and initialize the Processor class.

The processor class does the main execution. In the detectors class we simply, initialize the object of the processor class and pass it to main.py. The main logic happens to be executed under process_frame method. In the above code, in line number 19, Here we use exec_network which is the return of plugin_load method under Detectors class. The function that we invoke is start_async which executes requests asynchronously based on a request ID. We pass the current request ID, input configuration and the input_frame to execute the requests. Next, we wait for the request to finish execution, after which we pass the resultant to placeBoxes method where we superimpose the resultant with the original frame. The result (line 25) that you get upon finishing the network execution depends on the model you execute.

Finally, we pass the resultant processed frame. Now, the overall process fits into the overall flow as shown earlier. Now, let’s have a look into main.py which drives detect.py

Main.py

In main.py, we are involved in two main functionalities as mentioned below.

  1. Capture the frame by using OpenCV
  2. Pass the frames to the detectors.py for processing

Below is the code for main.py

The code is pretty much very straightforward to understand.

Initially, we simply capture the command line parameters. Next, under the method main, we initially capture the frames using OpenCV, then we pass it to Detectors.py where the frames are processed and finally, it is displayed.

The entire repo is available in GitHub for you to use and play with it.

Conclusion and further work:

Intel® Distribution of OpenVINO™ toolkit is an extremely useful framework where we can optimize the models and execute computer vision using deep learning on edge systems. This enables us to deploy such systems at edge level where the computational resource is scarce. That being said, I am currently working on another small project where we are truly able to utilize the hardware fully using truly asynchronous operation.

Further study:

Intel® Distribution of OpenVINO™ toolkit

OpenVINO pre-trained models

OpenVINO model zoo

--

--

Avirup Basu
Intel Software Innovators

Technical Lead - IoT @P360 || Full stack developer || GDG Siliguri organiser