How to choose a Pre-Trained model!

shaistha fathima
4 min readFeb 21, 2020

--

Choosing one of the Free Pre-Trained model’s from OpenVINO toolkit list for Edge Deployment. (Link)

Making the right choice!

This post is about “Choosing a (Free Version) of Pre-Trained model’s provided by OpenVINO Toolkit for learning or demo purposes.” Just so you know, OpenVINO Toolkit provides TWO types of Pre-trained models — Free Version (These can be deployed directed to edge with minimum model optimization steps, No further tweaking or improvisation can be done!) and the Public Version (Further changes can be made for model improvisations).

There are few questions you must ask yourself for the selection of good Pre-Trained model:

  • What are the desired OUTPUTS?
  • What kind of INPUTS do you expect?
  • Does the Pre-Trained Model support such input requirements?
  • What is the model accuracy and other specifications?
  • Compare with other models available — choose the best fit!

There might be a time when one model can serve the purpose outside of their desired or intended tasks as well. For example,

  • Traffic Light Optimization Model — Detect People, Vehicles and bikes.
  • Asses Traffic Level in Retail Aisles Model— Pedestrian Detection.
  • Delivery Robot Model — Identify roadside objects.
  • Monitor Form When Working Out — Human Pose Estimation

And many more, these are some of the Brainstorming applications you might think of which are outside of the model’s intended use-cases.

But, for the sake of simplicity, I will be choosing one of the models from the given list for “Person Detection” by answering the above questions.

What are the desired OUTPUTS?

For practicing OpenVINO Toolkit Pre-Trained Model deployment on edge, I hope to detect if a person is present in the given image or not? If a person is present, then the model should create a bounding box around them.

What kind of INPUTS do you expect?

Again for simplicity and practical knowledge, I would like to test it on “Mobile camera clicks”. So, the person may be sitting, standing or doing anything in the image.

Does the Pre-Trained Model support such input requirements?

As per my input image conditions i.e., Person Detection in as many positions as possible, such as standing, sitting, lying down,etc “person-detection-action-recognition-0006” seems to be a good fit as it has :

  • Pose coverage — sitting, writing, raising_hand, standing, turned around, lie on the desk.

What is the model accuracy and other specifications?

  • Detector AP (internal test set 2) — 90.70%
  • Accuracy (internal test set 2) — 80.74%

Both the AP and the accuracy are pretty good for this model and should successfully give desirable results. Some of the other specifications which makes it a good choice are:

  • Support of occluded pedestrians — YES
  • Occlusion coverage <50%

Model Description: Based on the RMNet backbone that includes depth-wise convolutions to reduce the amount of computations for the 3x3 convolution block. The first SSD head from 1/8 and 1/16 scale feature maps has four clustered prior boxes and outputs detected persons (two class detector). The second SSD-based head predicts actions of the detected persons.

In my case, this is the only model which fits to my desired requirements, but, if you have two or more models to choose from, the main focus should be the output, does it give the desired output for the given input types? If yes, then move on to model accuracy and so on.

Other Brainstorming Questions you might ask yourself for better understanding or just for FUN!

Which model network architecture was used in your selected pre-trained model and what do you think could be the reason to choose that particular network architecture for the model?

For example, in case of “person-detection-action-recognition-0006”, it includes RMNet backbone that includes depth-wise convolutions to reduce the amount of computations for the 3x3 convolution block.

RMNet is custom backbone designed for the fast but still accurate inference. The RMNet architecture is inspired by ResNet and MobileNet architectures.

For better understanding and use case of RMNet read this paper — Fast and Accurate Person Re-Identificationwith RMNet

Which other networks could be used for this exact purpose and why do you think so?

SSD Mobilenet V1 COCO Model, Faster RCN Inception V2 COCO Model, Deep Convolution Neural Networks with AlexNet,etc.

You read this post for better understanding!

That’s it! You have now successfully chosen a Pre-Trained model for practice and are ready to the next step of Pre-Processing and Model Optimization!

Happy Learning!

--

--

shaistha fathima

ML Privacy and Security Enthusiast | Research Scientist @openminedorg | Computer Vision | Twitter @shaistha24