AI at the Edge (Part 3) — OpenVINO Inference Engine

Abdelrahman Mahmoud
Udacity Intel Edge AI Scholars
4 min readJan 27, 2020

What is the inference engine and how to use it?

This article is part of a series on deploying AI models on the edge using OpenVINO toolkit. Here are the previous parts:
Part 1
Part 2

In the previous article, I explained the model optimizer of the OpenVINO toolkit, how it optimizes deep learning models by applying different optimization techniques, and how it outputs the given model in an Intermediate Representation (IR)

Here I explain the next step in the flow, which is using the Inference engine to further optimize the model

The Inference Engine

The inference engine provides more optimization for the input models based on the target hardware. The model optimizer is hardware-agnostic so it provides generic optimization of the model, whereas the inference engine is providing optimization for the specific hardware you are deploying the model onto.

In addition to further optimization, it has APIs to infer the inputs and integrate with your application.

The inference engine is built using C++ to provide high performance, but also python wrappers are included in it to give you the ability to interact with it in python

Supported devices and plugins

All Intel hardware is supported and the inference engine includes a plugin for each one that contains the implementation of inference on it. The following is a table of the plugins and devices

Note: not all model, input and output precisions are supported for all the devices. Check the documentation here

Usage (Python APIs)

1- Make an inference engine entity
IECore class represents an Inference Engine entity. It allows you to manipulate with plugins, infers inputs and gets outputs.

2- Read and load the model
The IENetwork class is used to read the network. it takes the .xml and .bin files, stores the network, and enables you to manipulate some parameters such as the output layers.

After reading the network, Then it is loaded to the inference engine by using “load_network()” from the IECore object. This method returns an ExecutableNetwrok class object

3- Prepare inputs and outputs

The inputs are given to the network in a dictionary format, where the keys are input layers and the values are the data {input_layer : Image}. Having an IENetwork object, we can know what the input layers using the “inputs” attribute.

The same applies to the output: it is stored in {output_layer : data} and the output layers can be known using the “outputs” attribute.

4- Do inference
There are two types of inference requests:

  • Synchronous: which freezes the application until the inference request is finished.
    * Make the request using “infer(input)” method of the ExecutableNetwork
    * Output: a dictionary with the output layers as keys and data as values
  • Asynchronous: which not freezes the application, but executes other functionalities. An example of something you might want to do until inference is done is preprocessing of the next frame, for example.
    * Make the request using “start_async(request_id, input)” method of the ExecutableNetwork
    * Output: a handler of InferenceRequest class. It has ‘outputs’ attribute that is a dictionary with the output layer names and their outputs
    * Wait using the “wait(time)” method of the InferenceRequest. Time can be set to 0 to return the status of the request, which is 0 if it is a success, or -1 to wait until inference finish.

Neural networks can have multiple input layers in the model. At inference, we iterate on these layers and pass the appropriate input to each one

Tutorial

You can see the full script and try it yourself here

Now the first thing we want to do is to download the model of interest, using the downloader script as described in part 1. Here I chose emotions recognition model from open model zoo here. The model is already in IR, so no need to use the model optimizer.

Then we need to instantiate an inference engine object

core = IECore()

Then read and store the network

network = IENetwork(model=xml_file, weights=bin_file

Also highly unlikely, but there may be some unsupported layers in the network that need special care. So let’s check them one by one:

suppoted_layers = core.query_network(network = network, device_name = 'CPU')

unsupported_layers = [layer for layer in network.layers.keys() if layer not in suppoted_layers]

if len(unsupported_layers) == 0:

print('All network layers are supported!')

else:

print('Those layers are not supported, please add extensions for them', unsupported_layers)

After that we load the network to the engine and get the network’s input shape

exec_network = core.load_network(network, 'CPU') 

input_layer = next(iter(network.inputs))

input_shape = network.inputs[input_layer].shape

Using the input shape, we pre-process the input image before inferring it

im = cv2.resize(im, (input_shape[2], input_shape[3]))

im = im.transpose((2,0,1))

im = im.reshape(1,*im.shape)

And the last step is inferring the image

inputs = {input_layer: im}
outputs = executable_network.infer(inputs)

--

--