Intel’s Edge AI OpenVINO (Part 3) — Inference Engine

Published in

Udacity Intel Edge AI Scholars

2 min readJan 15, 2020

These are my notes from Intel® Edge AI Scholarship Foundation Course Nanodegree Program at Udacity.
You can read here Part 1 and Part 2.

The Inference Engine ( IE) runs the actual inference on a model at the edge. The model can be either from Intel’s Pre-trained Models in OpenVINO that are already in Intermediate Representation ( IR) or our models from the Model Optimizer.

The Model Optimizer makes some improvements to the size and the complexity of the model to improve the memory and the computation time. The IE provides hardware-based optimizations for even further improvements. This helps the app to run at the edge using as little, of device resources, as possible.

The API of the IE allows easy integration with the app and it is built in C++ (CPU Version) and can interact with Python.

Supported Devices

The IE supports all Intel’s hardware:

CPU
GPU
FPGA (Field Programmable Gate Arrays) — can be configured by the customer after the manufacturing.
VPU (Vision Programming Units) — are Intel’s Neural Compute Sticks.
Heterogeneous Execution
GNA
MULTI

Feeding the IE with an IR

In order to feed an IR to the IE, we must use two classes:

IECore — a wrapper for Python to work with the IE &
IENetwork — to hold the network and load it into the IECore

We initialize the IECore easily, without any arguments. But for the IENetwork we need the .xml & .bin files (models & weights). With the query_network function from the IECore, we check if the layers are supported. For the layers that are not supported, we can use a CPU extension.

Inference Request

When we load the IENetwork to the IECore we get back an Executable Network. In this Executable Network, we send our Inference Requests.

We have Synchronous Requests, executing each block, waiting and do nothing until the response from the inference returns and Asynchronous Requests where the tasks may continue while waiting for the response.

Both of them are InferRequest objects and hold the inputs and outputs of the request.