AI at the Edge (Part 2) — OpenVINO model optimizer

Abdelrahman Mahmoud
4 min readJan 16, 2020

--

What is the model optimizer and how to use it?

In the previous part, we explained what OpenVINO toolkit is and its three main components: the model zoo, model optimizer, and inference engine. We briefly explained the flow which a pre-trained model follows to be optimized. Also, we covered the model zoo and how to use the pre-trained and optimized models available.

In this part, we cover the model optimizer in detail.

Model optimizer

The Model Optimizer

This component optimizes frameworks’ models by reducing the model’s size (parameters and layers), speeds them up and represents them in an Intermediate Representation (IR).

The model optimizer makes a generic model optimization. The optimized models then go through target-based optimization in the inference engine, which is based on the hardware that the model will be deployed to.

The model optimizer supports various frameworks: Caffe, Tensorflow, MXNet, ONNX format(including Pytorch), and Kaldi.

The optimizer’s configuration differs according to the model’s framework and architecture. It takes the model as an input along with some parameters and the output is:
- .xml file that contains optimized model architecture & metadata.
- .bin file that contains the weights and biases.

Optimization techniques:

1- Quantization: It is reducing the precision of the model without much loss of accuracy. Most DL models are built in FP32, so quantization is done to reduce the precision from FP32 to FP16 or INT8. This results in a smaller in size model and speeding up the inference process in exchange for a low loss in the accuracy. see the below benchmark as an example.

2- Freezing: A typical Tensorflow model is saved in 4 files:
model-ckpt.meta: contains the complete model graph.
model-ckpt.data-0000-of-00001: contains all the values of variables(weights, biases, placeholders, gradients, hyper-parameters, … etc). and think about it, do we need gradients for example when doing inference?- — model-ckpt.index: metadata table with each key is the name of a Tensor.
checkpoint: All checkpoint information

Freezing the model is the process of removing operations and metadata that are only needed for training, such as gradients, optimization operations & the learning rate, and saving all the model information in one single file.

3- Fusion: Many CNNs contain layers that can be combined together into one single layer. For example, “BatchNormalization” layers and “ScaleShift” layers can be represented as a sequence of linear operations (additions and multiplications). After representing these layers in linear operations, consecutive operations are merged together, and then fuses them with the previous or following convolutional/FC layer to be a single layer. This executes all the three operations on one kernel, instead of switching between 3 kernels on GPUs.

Left: before fusion — Right: after fusion

There is another type of fusion dependent on the framework, which is “grouped convolution fusion” for TensorFlow models.

Optimizer inputs:

As said before, the optimizer configuration is dependent on the framework used to build the model. However, there are frame-work agnostic parameters. To configure the model optimizer for all the frameworks, go to the <INSTALL_DIR>/deployment_tools/model_optimizer/install_prerequisites directory and run ./install_prerequisite.sh
To optimize a model you first go to the <INSTALL_DIR>/deployment_tools/model_optimizer directory. the following step depends on the framework

To optimize ONNX models: no specific parameters are available
Use the mo.py script to convert it
command: python3 mo.py --input model <model_name>

To optimize TensorFlow models, there are 3 methods and specific parameters
Method1: if the model is stored in inference_graph and checkpoint files, use --input_model and --input_chekpoint.
command: python3 mo_tf.py --input_model <inference_graph_file>.pb --input_chekpoint <check_point_file>

Method2: if the model is unfrozen and stored in 3/4 files. use --input_meta_graph
command: python3 mo_tf.py --input_meta_graph <meta_graph_file>

Method3: if the model is saved in the “SavedModel format” from Tensorflow, use --saved_model_dir
commandpython3 mo_tf.py --saved_model_dir <saved_model_directory>

[Advanced] Custom layers:

Now, although unlikely to happen, you might be using a model that contains layers that are unsupported yet in OpenVINO such as the “cosh” activation layer. This is the list of the supported layers.

Now to To optimize these custom layers, we make custom layers extensions to the optimizer and process the inputs through these layers in the inference using the original framework.

Custom layers extensions:
Four extension templates are generated using Model Extension Generator script<INSTALL_DIR>/deployment_tools/tools/extension_generator/extgen.py You can edit these files and then generate the model IR files using the model optimizer script. see the documentation for further info.

--

--