How to choose a Pre-Trained model!

4 min readFeb 21, 2020

Choosing one of the Free Pre-Trained model’s from OpenVINO toolkit list for Edge Deployment. (Link)

This post is about “Choosing a (Free Version) of Pre-Trained model’s provided by OpenVINO Toolkit for learning or demo purposes.” Just so you know, OpenVINO Toolkit provides TWO types of Pre-trained models — Free Version (These can be deployed directed to edge with minimum model optimization steps, No further tweaking or improvisation can be done!) and the Public Version (Further changes can be made for model improvisations).

There are few questions you must ask yourself for the selection of good Pre-Trained model:

What are the desired OUTPUTS?
What kind of INPUTS do you expect?
Does the Pre-Trained Model support such input requirements?
What is the model accuracy and other specifications?
Compare with other models available — choose the best fit!

There might be a time when one model can serve the purpose outside of their desired or intended tasks as well. For example,

Traffic Light Optimization Model — Detect People, Vehicles and bikes.
Asses Traffic Level in Retail Aisles Model— Pedestrian Detection.
Delivery Robot Model — Identify roadside objects.
Monitor Form When Working Out — Human Pose Estimation

And many more, these are some of the Brainstorming applications you might think of which are outside of the model’s intended use-cases.

But, for the sake of simplicity, I will be choosing one of the models from the given list for “Person Detection” by answering the above questions.

What are the desired OUTPUTS?

For practicing OpenVINO Toolkit Pre-Trained Model deployment on edge, I hope to detect if a person is present in the given image or not? If a person is present, then the model should create a bounding box around them.

What kind of INPUTS do you expect?

Again for simplicity and practical knowledge, I would like to test it on “Mobile camera clicks”. So, the person may be sitting, standing or doing anything in the image.

Does the Pre-Trained Model support such input requirements?

As per my input image conditions i.e., Person Detection in as many positions as possible, such as standing, sitting, lying down,etc “person-detection-action-recognition-0006” seems to be a good fit as it has :

Pose coverage — sitting, writing, raising_hand, standing, turned around, lie on the desk.

What is the model accuracy and other specifications?

Detector AP (internal test set 2) — 90.70%
Accuracy (internal test set 2) — 80.74%

Both the AP and the accuracy are pretty good for this model and should successfully give desirable results. Some of the other specifications which makes it a good choice are:

Support of occluded pedestrians — YES
Occlusion coverage <50%

Model Description: Based on the RMNet backbone that includes depth-wise convolutions to reduce the amount of computations for the 3x3 convolution block. The first SSD head from 1/8 and 1/16 scale feature maps has four clustered prior boxes and outputs detected persons (two class detector). The second SSD-based head predicts actions of the detected persons.

In my case, this is the only model which fits to my desired requirements, but, if you have two or more models to choose from, the main focus should be the output, does it give the desired output for the given input types? If yes, then move on to model accuracy and so on.

How to choose a Pre-Trained model!

Other Brainstorming Questions you might ask yourself for better understanding or just for FUN!

Written by shaistha fathima