Benchmarking the performance of pre-trained models with Intel OpenVINO

Ramya Kappagantu
5 min readDec 13, 2019

--

Deep Learning has evolved hand-in-hand with the digital era, which has brought about an explosion of data in all forms and from every region of the world. This data, simply known as big-data, is drawn from various resources like e-commerce platforms, internet etc. This enormous data is readily accessible and can be shared easily through cloud computing.

However, the data, which normally is unstructured, is so vast that it could take decades for humans to comprehend it and extract relevant information and hence companies are adapting to AI systems for automated support.

Hence, Deep Learning learns from vast amounts of unstructured data that humans could take normally decades to understand the process. It is a subset of machine learning.

The process of Machine Learning or Deep Learning essentially consists of two phases- feature extraction and using the features extracted to perform classification or regression to obtain the output. Deep Learning comes into picture when you want to extract more number of features and hence performing the latter operation accurately. This particular operation could be achieved by increasing the number of layers. But increasing the layers is not always the only solution for a better accuracy because “an effort to accuracy can lead to overfitting. Also, as we go on increasing the layers, it takes huge amount to time to extract the features which eventually effects the performance of the model. In order to overcome these drawbacks, we have Transfer Learning.

Transfer Learning is mainly used when you already have data and using that data to solve similar problem. Here, instead of starting from scratch, right from extracting the features, you already have features extracted and using those features you are performing the second step of performing either classification or regression.

Classic Deep Learning model — Feature extraction, classification.

Transfer Learning model — classification.

Transfer Learning can be performed through a set of models called pre-trained models. A pre-trained model is a model that was trained on a large benchmark dataset to solve a problem similar to the one that we want to solve. Each pre-trained model has been trained on a number of datasets and performance metrics have been calculated. Based on the results, the model can be used accordingly to solve a similar problem.

Intel do provides pre-trained models!

Intel OpenVINO Toolkit is a comprehensive toolkit that you can use to develop and deploy vision-oriented solutions on Intel Platforms.Vision-oriented means the solutions use images or videos to perform specific tasks. A few of the solutions use cases include autonomous navigation, digital surveillance cameras, robotics, and mixed-reality headsets.

You can find the Intel OpenVINO distribution of pre-trained models here.

Similarly, various platforms provide various pre-trained models that can be used to solve various problems. So, the obvious question is which model would perform the best when I have pre-trained models provided by various platforms to solve the same problem? Different tools exhibit different features and running performance when training different types of deep networks on different hardware platforms, which makes it difficult for the end users to select an appropriate pair of software and hardware. Benchmarking answers this. Benchmarking is a comparative study of performance of a network across
different platforms. It can be both internal and external. Internal benchmarking is a process in which a company or an organisation looks within its own business to try and determine the best practice or methodology for conducting a particular task. External benchmarking, sometimes described as competitive benchmarking, compares business performance against other companies.

In this article, I will be benchmarking the performance of person-detection-retail-0013 by Intel OpenVINO toolkit and ssd_resnet_50_fpn_coco by Tensorflow. The problem I solved was pedestrian detection and re-identification. Also, mall dataset was used. Considering all the edge cases and possibilities of pedestrains in crowd(overlappings between pedestrains possible in crowds), the following graph has been plotted using matplotlib.

green-openvino model and red-tensorflow model

In order to increase the efficiency of the model, to detect more number of pedestrains in the crowd, centre cropping of image is done and benchmarking is repeated again. The results follow:

green-openvino model and red-tensorflow model

To better understand benchmarking, let us have a look at how the frames are in the dataset we used.

Mall dataset- example frame

The dataset is also provided with ground truth values of each pedestrian present in the frame.

Mall dataset- annotated frame

The minimum pedestrain height is >100 px(can be calculated from the ground truth provided). The performance varies according to the specifications each model is pertained to. The pre-trained model person-detection-retail-0013 works well if the person in the frame is standing upright and parallel to the image frame and the model has an occlusion coverage of <50%. ssd_resnet_50_fpn_coco cannot detect persons that are too small and too close. The model also cannot detect frames having less illumination.

Comparing the speeds of the networks, Intel OpenVINO models are faster than tensorflow models because OpenVINO models run on C++ whereas tensorflow models run on Python. The average time taken by ssd_resnet model to process a frame is 3 sec, whereas it is 0.34 sec for person-detection-retail-0013.

Conclusion:

Depending on the feature extraction, performance of various pre-trained models can be calculated. To improve the performance of person-detection-retail-0013,we can create an image of different aspect ratio by cropping. Here as we can see, minimum height of the person in the image, image resolution and occlusion coverage have played an important role for benchmarking the performance.

Also the performance of a model depends on how well data is collected. For instance, pedestrian detection could be indoor or outdoor. There are some datasets which specially contain frames having indoor pedestrians. Some datasets contain the combination of indoor and outdoor pedestrians. If we have a combination of such data, the result could sometimes lead to overfitting because the model may also detect other objects as persons. Hence, before training your model, make sure the data is structured well for better performance. As we saw, Deep Learning uses the structured data and predicts the output of unstructured data. Intel OpenVINO models perform well even for semi-supervised data.

Also read:

Model Optimizer

Pedestrian Tracking Demo using OpenVINO Toolkit

This blog post made as part of the Intel® Student Ambassador competition.

--

--