How to run Keras model inference x3 times faster with CPU and Intel OpenVINO

Published in

The Startup

6 min readJan 28, 2019

In this quick tutorial, you will learn how to setup OpenVINO and make your Keras model inference at least x3 times faster without any added hardware.

Though there are multiple options to speed up your deep learning inference on the edge devices, to name a few,

Adding a low-end Nvidia GPU like GT1030

pros:

easy to integrate, since it also leverages Nvidia’s CUDA and CuDNN toolkit to accelerate the inference same as your development environment, no significant model conversion is needed.

cons:

A PCI-E slot must exist on the target device’s motherboard to interface with the graphics card, which adds extra cost and space to the edge device.

2. Use ASIC chips geared towards accelerating neural network inferencing, such as the Movidius neural compute sticks, Lightspeeur 2801 neural accelerator.

pros:

Just like a USB drive, they also work on different host machines, whether it is the desktop computer with Intel/AMD CPU or Raspberry Pi single board computer with ARM Cortex-A.
Neural network computation is offloaded to those USB sticks allows the host machine’s CPU to worry only about more general-purpose computation like image preprocessing.
Scaling can be as easy as plugin more of those USB sticks as your throughput requirement increases on the edge device.
They generally have higher performance per watt spec compared with CPU/ Nvidia GPUs.

cons:

Since they are ASIC(application specific IC), expects limited support of some TensorFlow layers/operations.
They also require special model conversion to create instructions understandable for the specific ASIC.

3. Embedded SoC came with an NPU(neural processing unit) like the Rockchip RK3399Pro.

NPU is similar to ASIC chips which require special instruction and model conversion. The difference is that they exist in the same silicon die with the CPU which make the form factor smaller.

All previously mentioned acceleration options all came with an additional cost. However, if an edge device already has an Intel CPU, you might as well accelerate its deep learning inference speed x3 time for free with Intel’s OpenVINO toolkit.

Intro to OpenVINO and setup

You might wonder where does the extra speedup come from without additional hardware?

First and for most, since OpenVINO is an Intel product, it is optimized for its processors.

The OpenVINO inferencing engine can inference models with either CPU or Intel’s integrated GPU with different input precision supports.

CPU only support FP32 while its GPU supports both FP16 and FP32.

The CPU plugin leverages the Intel Math Kernel Library for Deep Neural Networks (MKL-DNN) as well as the OpenMP to parallelize calculations.

There is the model optimization as you will see later in this tutorial, during which extra steps are taken to make the model more compact for inference.

Merging of group convolutions.
Fusing Convolution with ReLU or ELU.
Fusing Convolution + Sum or Convolution + Sum + ReLu.
Removing the power layer.

Now, let’s setup OpenVINO on your machine, choose your OS on this page, follow the instruction to download and install it.

System requirement

6th-8th Generation Intel® Core™
Intel® Xeon® v5 family
Intel® Xeon® v6 family

Operating Systems

Ubuntu* 16.04.3 long-term support (LTS), 64-bit
CentOS* 7.4, 64-bit
Windows* 10, 64-bit

If you already installed Python 3.5+, it is safe to ignore the notice to install Python 3.6+.

Once the installation is done, run either C:/Intel/computer_vision_sdk/deployment_tools/model_optimizer/install_prerequisites/install_prerequisites_tf.bat

or ~/Intel/computer_vision_sdk/deployment_tools/model_optimizer/install_prerequisites/install_prerequisites_tf.sh

depends on your OS to install any required Python packages for OpenVINO to work with TensorFlow.

InceptionV3 model inference in OpenVINO

You can download the full source code for this tutorial from my GitHub, it includes an all in one Jupyter notebook walks your through converting a Keras model for OpenVINO, making predictions as well as benchmarking inference speed for all three environments — Keras, TensorFlow, and OpenVINO.

Run the setupvars.bat before calling jupyter notebook to set up the environment.

C:\Intel\computer_vision_sdk\bin\setupvars.bat

Or in Linux add the following line to ~/.bashrc

source ~/intel/computer_vision_sdk/bin/setupvars.sh

Here is an overview of the workflow to convert a Keras model to OpenVINO model and make a prediction.

Save the Keras model as a single .h5 file.
Load the .h5 file and freeze the graph to a single TensorFlow .pb file.
Run the OpenVINO mo_tf.py script to convert the .pb file to a model XML and bin file.
Load the model XML and bin file with OpenVINO inference engine and make a prediction.

Save the Keras model as a single `.h5` file

For the tutorial, we will load a pre-trained ImageNet classification InceptionV3 model from Keras,

Freeze the graph to a single TensorFlow `.pb` file

This step removes any layers and operations not necessary for inference.

OpenVINO model optimization

The following snippet runs in the Jupyter notebook, it locates the mo_tf.py script based on your OS(Windows or Linux), you can change the img_height accordingly. The data_type can also be set to FP16 to gain extra speed up when inference on Intel integrated GPU with minor degraded precession.

After running the script, you will find two new files generated under directory ./model, frozen_model.xml and frozen_model.bin. They are the optimized Intermediate Representation (IR) of the model based on the trained network topology, weights, and biases values.

Inference with OpenVINO Inference Engine(IE)

If you have set up the environment correctly, path like C:\Intel\computer_vision_sdk\python\python3.5 or ~/intel/computer_vision_sdk/python/python3.5 will exist in PYTHONPATH. This is necessary to load the Python openvino package during runtime.

The following snippet uses the CPU to run the inference engine, while it is also possible to run on Intel GPU if you have opted to use FP16 data_type previously.

Speed Benchmark

Benchmark setup,

TensorFlow version: 1.12.0
OS: Windows 10, 64-bit
CPU: Intel Core i7–7700HQ
The number of inferences to calculate the average result: 20.

Benchmark result for all three environments — Keras, TensorFlow, and OpenVINO shown below.

Keras          average(sec):0.079, fps:12.5
TensorFlow     average(sec):0.069, fps:14.3
OpenVINO(CPU)  average(sec):0.024, fps:40.6

The result might vary with the Intel processors you are experimenting with, but expect significant speedup compared to running inference with TensorFlow / Keras on CPU backend.

Conclusion and further reading

In this tutorial, you have learned how to run model inference several times faster with your Intel processor and OpenVINO toolkit compared to stock TensorFlow. While OpenVINO can not only accelerate inference on CPU, the same workflow introduced in this tutorial can easily be adapted to a Movidius neural compute stick with a few changes.

OpenVINO documentations you might find helpful.

Install Intel® Distribution of OpenVINO™ toolkit for Windows* 10

Install the Intel® Distribution of OpenVINO™ toolkit for Linux*

OpenVINO — Advanced Topics — CPU Plugin where you can learn more about various model optimization techniques.

Download the full source code for this tutorial from my GitHub.

Share on Twitter Share on Facebook

Originally published at www.dlology.com.

How to run Keras model inference x3 times faster with CPU and Intel OpenVINO

Intro to OpenVINO and setup

InceptionV3 model inference in OpenVINO

Save the Keras model as a single `.h5` file

Freeze the graph to a single TensorFlow `.pb` file

OpenVINO model optimization

Inference with OpenVINO Inference Engine(IE)

Speed Benchmark

Conclusion and further reading

Download the full source code for this tutorial from my GitHub.

This story is published in The Startup, Medium’s largest entrepreneurship publication followed by +416,678 people.

Subscribe to receive our top stories here.

Written by Chengwei Zhang

How to run Keras model inference x3 times faster with CPU and Intel OpenVINO

Intro to OpenVINO and setup

InceptionV3 model inference in OpenVINO

Save the Keras model as a single .h5 file

Freeze the graph to a single TensorFlow .pb file

OpenVINO model optimization

Inference with OpenVINO Inference Engine(IE)

Speed Benchmark

Conclusion and further reading

Download the full source code for this tutorial from my GitHub.

This story is published in The Startup, Medium’s largest entrepreneurship publication followed by +416,678 people.

Subscribe to receive our top stories here.

Written by Chengwei Zhang

Save the Keras model as a single `.h5` file

Freeze the graph to a single TensorFlow `.pb` file