Intel’s Edge AI OpenVINO (Part 3) — Inference Engine

Ilias Papachristos
Udacity Intel Edge AI Scholars
2 min readJan 15, 2020
Intel Scholarship Badge

These are my notes from Intel® Edge AI Scholarship Foundation Course Nanodegree Program at Udacity.

You can read here Part 1 and Part 2.

The Inference Engine ( IE) runs the actual inference on a model at the edge. The model can be either from Intel’s Pre-trained Models in OpenVINO that are already in Intermediate Representation ( IR) or our models from the Model Optimizer.

The Model Optimizer makes some improvements to the size and the complexity of the model to improve the memory and the computation time. The IE provides hardware-based optimizations for even further improvements. This helps the app to run at the edge using as little, of device resources, as possible.

The API of the IE allows easy integration with the app and it is built in C++ (CPU Version) and can interact with Python.

Supported Devices

The IE supports all Intel’s hardware:

Feeding the IE with an IR

In order to feed an IR to the IE, we must use two classes:

  • IECore — a wrapper for Python to work with the IE &
  • IENetwork — to hold the network and load it into the IECore

We initialize the IECore easily, without any arguments. But for the IENetwork we need the .xml & .bin files (models & weights). With the query_network function from the IECore, we check if the layers are supported. For the layers that are not supported, we can use a CPU extension.

Inference Request

When we load the IENetwork to the IECore we get back an Executable Network. In this Executable Network, we send our Inference Requests.

We have Synchronous Requests, executing each block, waiting and do nothing until the response from the inference returns and Asynchronous Requests where the tasks may continue while waiting for the response.

Both of them are InferRequest objects and hold the inputs and outputs of the request.

Handling Requests

Each InferRequest has inputs (eg image frames), outputs (the results) and latency (the inference time of current request) as attributes.

Image Integration Steps from OpenVINO Toolkit documentation

The last article is going to be about deploying an Edge app with Intel’s OpenVINO Toolkit.

Originally published at https://www.linkedin.com.

I hope you enjoyed reading this post. Feel free to clap 😀

You can follow me on Medium or Twitter.

--

--

Ilias Papachristos
Udacity Intel Edge AI Scholars

Full-Time Family Man, Retired Military Helicopter Pilot, Kendo Instructor, Google Cloud Champion Innovator AI/ML, Lead GDG Cloud Thessaloniki, WTM Ambassador