Detect a Penguin at the EDGE

Aya Zaki
Udacity Intel Edge AI Scholars
10 min readJan 16, 2020

Last November, Intel announced the launch of Edge AI DevCloud. Developers can use it for free but they need to submit a form explaining their project proposal and their company email. Access to most requests is granted if such information is provided. However, if you have no affiliation to a company and/or not actively working on an EDGE AI project, here is a great opportunity to get ONE-week access to explore the EDGE AI DevCloud. This was a 4 hour, hands-on workshop where Intel® took the participants through a computer vision workflow using the OpenVINO toolkits including support for deep learning algorithms that help accelerate Smart Video applications. The goal of the workshop is to learn how to optimize and improve performance with and without external accelerators and utilize tools to help you identify the best hardware configuration for your needs.

The EDGE DevCloud offers a complete workflow for code development, job submission and results viewing. Developers can create their run scripts through a Jupyter Notebook on the development server — powered by Intel® Xeon® Scalable processors — then submit these run scripts into a job queue to run inference on edge compute servers. Different hardware acceleration options, such as integrated GPU, VPU, and FPGA.

https://software.intel.com/en-us/devcloud/edge

Using Intel® Distribution of OpenVino™ toolkit — already installed and configured on the cloud — you should be able to run inference on a specific edge compute node or multiple Edge compute nodes, i.e run simultaneously in parallel. The hardware heterogeneity offered by the deep learning toolkit is a big plus for inference acceleration at the edge—More detail on this later.

Starting your AI application on EDGE DevCloud is a piece of cake with all the code samples and demos available. Once signed in the DevCloud, I was able to access getting-started tutorials, smart video workshop notebooks as well as more advanced reference samples for IoT applications.

Let’s get started with one of the tutorials available for object classification using the “squeezenet1.1” model. You can find the code details on my GitHub. The inference is run on CPU(s) using the intel distribution of the OpenVINO™ toolkit. So, how do you use the toolkit?

Main components of the Intel® DL Deployment Toolkit included within the OpenVINO™ toolkit

Step I. Building the model — basically, this is not part of the OpenVINO toolkit. You can use all supported DL frameworks to create your model. The currently supported frameworks are Tensor flow, Caffe, mxnet, ONNX, Kaldi. The intel OpenVINO™ toolkit is focused on Deep Learning inference, not training. So, the very first step is to have your own pre-trained model ready. However, there are like 40 plus pre-trained models provided by the toolkit. So, if you don’t have your own model and you want to try some ideas, you can try these models. They give age, gender, face recognition, human detection, head pose, human pose, etc. From the OpenVino documentation, you can find the model description, the inputs and the outputs of the model.

In this tutorial, the model picked for object classification is the public squeezenet1.1 model. This model is capable of classifying and reporting the probability of 1000 different objects including different species of cats, dogs, birds, insects, etc. The tutorial directly provides the model IR (intermediate representation) files required for the inference engine without covering how to download and convert the Caffe model to IR files. So, let me show you, very briefly, how to do this here — That is one thing I learned as a Udacity-Intel EDGE AI scholar :)
To download the model, you can use the downloader; a python script available as part of the deployment tools in the OpenVINO toolkit.

source /opt/intel/openvino/bin/setupvars.sh

python3 /opt/intel/openvino/deployment_tools/tools/model_downloader/downloader.py --name squeezenet1.1-o <download_dir>

This downloads the Caffe model files; the “.prototxt” and the “.caffemodel”. These file formats belong to the DL Caffe platform and are not readable by the inference engine. So, a conversion step should take place before model deployment. The model optimizer tool — more details about it in Step II — is responsible for such conversion. Here is how to run it:

python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo.py -input_model ../public/squeezenet1.1/squeezenet1.1.caffemodel -- data_type FP16 -- output_dir <output_dir>

Step II. Optimize the model. Once you have your pre-trained model from any supported DL platform, you will need this model optimizer, which is a python based tool, to translate and optimize your model. It will generate the IR files (Intermediate representation files): .bin, which holds the weights and biases, and .xml, which holds the model topology. IR files are fed to the inference engine which is used to identify the hardware where you want to do the inference.

The model optimizer converts the model from different frameworks into an intermediate format. It makes sure the model becomes framework agnostic. Once converted to IR format, it doesn’t matter which framework the model comes from.

If you are using the open model zoo pre-trained models, please note that these models are already in the IR format so you can use them directly.

So far, we have talked about the format translation functionality of the Model Optimizer (MO), but does it perform any optimizations to the model as its name obviously indicates? The answer is, yes. In the following, I will explain briefly some of the generic optimization techniques used by MO:

a) Drop unused layers (Drop out): Dropping certain layers that are so important in training but they are not at all required when you go to the inference. For example, the drop out layer — It tries different combinations of the layers to drop out some layers during the training to accelerate the training process and ensure maximum accuracy. This is important in the training phase but not at all required for inference because its job is to accelerate the training. So, once the model is trained, this layer is useless in inference so MO just removes this layer.

b) FP16/ Int8 quantization: You can quantize your model for inference. Currently, supported quantizations are FP32, FP16 and INT8. CPU supports all three quantizations while GPU — referring to Intel integrated GPU — supports FP32 and FP16. If using an FP32 model on Movidius it will be internally converted to FP16 because that is the only quantization supported.

c) Layer fusion: This technique is quite tricky. Let's consider a very simple example. Suppose for a convolutional neural network (CNN), we have 4 layers — Definitely, CNNs have more layers but this is a simplified case.

We start with these 4 layers: Convolution, Batch Normalization (BN), ReLU, and Pooling. When converting the model, the MO removes the BN layer and adds the functionality as part of the convolution layer. During training, the Batch Normalization factor is calculated. Once the training is done, this batch normalization factor remains a constant. So, this constant value can be added to the convolution layer as a multiplying factor to the layer weights. The ReLu layer which is an activation layer is just an if statement in the trained model so it is just added as part of the convolution layer as well. This reduces the model to only 2 layers now without any change in the functionality.

Should you expect any accuracy losses due to model optimization? Well, yes, but not huge and mainly due to quantization because, bottom line, we are not changing any of the model mathematics during conversion.

Step III. Deploy the model. Deployment is writing your own application source code. In this source code, the Inference Engine API will take the converted IR files, read it, load it and create memory space for reading the picture/video data and then run the inference using the model.

Inference can be many types, usually classification, detection, and segmentation for Computer Vision applications — Video: the EYE of IoT. As we move from classification to segmentation the model complexity increases and so does the inference time leading to more computation power. The complexity of the problem (data set) dictates the network structure. The more complex the problem, the more ‘features’ required, the deeper the network. The Inference Engine (IE) will give you the option to choose which hardware you are going to use. There are different Device Plug-ins: CPU plugin, GPU plugin, Myriad Plugin, FPGA Plugin. Depending on the hardware you use, you call the different Plugins. In your application, you may need to call OpenCL to create custom kernels for running inference on non-CPU devices.

By specifying your hardware for the IE, your model becomes hardware agnostic. Optimizations are done by IE so you don’t have to worry about which hardware to use. There is also the Hardware Heterogeneity option for OpenVino Toolkit using the “HETERO plugin”, where you can specify two hardware devices and the engine follows a fallback policy. If there is any layer that is not supported by 1st hardware, it will fall back to the next priority hardware. Say, for example, you have FPGA and CPU. So, the FPGA can run inference faster but some of the model layers may not be supported by the FPGA plug-in. In this case, all layers would be run on the FPGA except those unsupported which would fall back by default to CPU.

The DevCloud tutorials focus on the last step of user Application and using the Inference Engine to run the model on CPU. Using the Inference Engine API follows the basic workflow illustrated below.

0_ Set the stage
This includes importing the required Python modules:
os — Operating system specific module (used for file name parsing)
cv2 — OpenCV module
time — time tracking module (used for measuring execution time)
numpy — n-dimensional array manipulation
openvino.inference_engine — import the IENetwork and IEPlugin objects
matplotlib — import pyplot used for displaying output images

1_ Create Plug-in for device
The device Plug-in is basically a library to support the chosen hardware for running inference. It is created using the IEPlugin object. You might need to add a hardware extension in case the model has layers that are not supported by the plugin.

plugin = IEPlugin(device=device)
plugin.add_cpu_extension(cpu_extension_path)

2_ Create Network from Model IR files
There is a network object that needs to be created. This object is called “IENetwork” and is loaded with the model IR files when first created.

net = IENetwork(model=model_xml, weights=model_bin)

One thing to make sure of at this point is to check that all layers in the model are supported by the plugin and the library extension. To do so, you can simply access the supported layers of the plugin and the model layers, then compare them.

plugin.get_supported_layers(net)
net.layers.keys()

3_ Load the Model into the Device Plugin
In order to run the inference, you need to load the model object “net” into the device object “plugin” to generate an execution object which is used later for inference.

exec_net = plugin.load(network=net)

4_ Prepare Input Image
The input image is loaded using OpenCV ( a how-to example: VideoCapture). Then, you get its width and height. Next, you need to pre-process the input image to match the required dimensions for the inference model as well as channels (i.e. colors) and batch size (number of images present). The basic steps performed using OpenCV are:

+ Resize image dimensions form image to model’s input W x H: frame = cv2.resize(image, (w, h))
+ Change data layout from (H x W x C) to (C x H x W) frame = frame.transpose((2, 0, 1))
+ Reshape to match input dimensions frame = frame.reshape((n, c, h, w))

To get the squeezenet1.1 model input dimensions, you can access them from the “net” object.
net.inputs > {‘data’: <openvino.inference_engine.ie_api.InputInfo object at 0x7fe6148796c0>}

5_ Run Inference
Now that the input is in the desired format, a single line is used for inference. The input is fed to the exec_net object in a dictionary form. This should always be the same format we got from the net object using “net.inputs”

res = exec_net.infer(inputs={‘data’: in_frame})

6_ Process and display results
Developers are responsible for parsing inference output. Many output formats are available. Unless a model is well documented, the output pattern may not be immediately obvious. Some examples include:
- Simple Classification (alexnet): an array of float confidence scores, # of elements=# of classes in the model
- SSD: Many “boxes” with a confidence score, label #, xmin,ymin, xmax,ymax

Finally, the processing of the squeezenet1.1 model used in the object classification tutorial is straight forward. The model outputs the probabilities of all possible classes in a BLOB (Binary Large OBject) structure. So, you only need to squeeze the output and then sort it. To get the top ten guesses for the input image, you need to take the last 10 entries of the sorted list and reverse their order and you’re all done!
Along with the tutorial, there are pictures available such as, a dog, a cat and a bird, to validate the inference step. You can also go ahead and try your own pictures. I picked up the picture of a Penguin! — this is the favorite animal of my favorite friend. Here is how the results would look like:

So, I encourage you to have a look at the Jupyter Notebook, then start coding at the EDGE yourself!
— Happy fast Inference

--

--