Intel’s Edge AI OpenVINO (Part 2)

Ilias Papachristos

Published in

Udacity Intel Edge AI Scholars

3 min readJan 13, 2020

These are my notes from Intel® Edge AI Scholarship Foundation Course Nanodegree Program at Udacity.

As promised in my previous article I’m going to write about Intel’s OpenVINO Model Optimizer.

a. Basics of the Model Optimizer

The Model Optimizer helps convert models in multiple different frameworks to an Intermediate Representation (IR), which is used with the Inference Engine (IE). This is a REQUIRED step to move onto the IE if the model is not one of the pre-trained models. It includes improvements in size and speed but not on accuracy. There are losses on accuracy but are minimum.

b. Quantization, Freezing, Fusion

There are more optimization techniques but I worked with these three. Also, there is Hardware Optimization which is done by the IE (I’ll write about it in the next article).

Quantization
Reduces the precision of the weights and biases by reducing the compute time and size, with some loss of accuracy.

The models on OpenVINO Toolkit have by default FP32 (32-bit floating-point values). There are available also FP16 and INT8 (8-bit integer values). INT8 is currently available only in the Pre-Trained Models AND the MOdel Optimizer doesn’t, currently, support it.

FP16 and INT8 will lose some accuracy BUT the model will be smaller and the compute times faster, which is what we want and need. 😉

Freezing
Is used for fine-tining a Neural Network (NN). In TensorFlow(TF) it removes the metadata that is only needed for training and certain other operations such as related to backpropagation. Freezing in TF models is a good idea before performing different inference or converting with the Model Optimizer.

Fusion
It combines certain operations together into one. This way we need less computational. MyAwesomeLayer = Batch Normalization Layer + Activation Layer + Convolutional Layer. It’s awesome, isn’t it?!!!! 😎

Fusion is useful for GPU inference because a fused operation occurs in one kernel, so less overhead in switching from one kernel to the other.

c. Supported Frameworks with OpenVINO Toolkit

Caffe, Kaldi, MXNet, ONNX &TensorFlow (in alphabetical order) are the supported frameworks. ONNX supports PyTorch and AplleML models through some more conversion steps).

There are differences in how to handle these frameworks BUT most of them are handled under the hood of the OpenVINO Toolkit.

d. Intermediate Representation (IR)

The IR s are a shared dialect among the NN layers. A standard structure, naming for NN architectures. So the Conv2D in TF, Convolution in Caffe, Conv in ONNX are converted to Convolution layer in IR.

After that, they are loaded directly to IE. But first, the Model Optimizer makes two files:

.xml — Describes the network topology, holds the architecture of the model and other important data.
.bin — Contains the weights and biases binary data.

Both of the files are needed to run inference. We generate them with the — data_type.

From my experience, I found out that TF has more steps (like checking if it’s in the Frozen — Unfrozen state).

While Caffe is simpler and ONNX is even simpler!

e. Custom Layers

Things are very easy here. Every layer that isn’t in the list of the supported layers is automatically classified as Custom Layer by the Model Optimer.

Next article is going to be about the Inference Engine of Intel’s OpenVINO Toolkit.

Originally published at https://www.linkedin.com.

I hope you enjoyed reading this post. Feel free to clap 😀

You can follow me on Medium or Twitter.