Object Detection : A Comparison of performance of Deep learning Models on Edge Using Intel Movidius Neural Compute Stick and Raspberry PI3

Published in

Intel Student Ambassadors

5 min readMar 26, 2019

AVehicle Detection involves finding whether there is vehicle present or not secondly which type of vehicle is present and how many vehicles are present. Basically, vehicle presence needs to be detected after detecting a vehicle it has to be classified. Classification is the main part which means what type of vehicle it is (car, bus, bike, etc.). Smart Traffic Management aims at the avoidance of traffic on roads especially on highways because where manual governance and management is difficult. Implementation of traffic surveillance camera finds primary application in traffic monitoring through which management will become easier. If those cameras are enabled by the modern technology enables the Smart Traffic Management.

The Edge device (Raspberry Pi, Intel Movidius and a camera) requires no internet connection for vehicle detection utilizes internet only for connection to the cloud for real time data transfer. Instead of sending the images or video to the cloud for further analysis, the detection happens on the device and only the results (i.e. pure text) are sent to the cloud for i.e. statistical and visualization purposes. Such approach brings numerous benefits like: solves latency and communication bandwidth related concerns.

Intel Movidius Neural Computing Stick (NCS)

The Intel Movidius Neural Compute Stick (NCS) is a tiny fan less deep learning device that can be used to learn AI programming at the edge. NCS is powered by the same low power high performance Intel Movidius Vision Processing Unit (VPU). which has the convolution neural network of frameworks which is utilized for vehicle detection which requires setting up of software tools to compile profile and validate Deep Neural Networks (DNNs) as well as Neural Compute API on the NCS. We use the NCSDK.

An SDK is a collection of software used for developing applications for a specific device or operating system. In this case it is NCSDK. And the framework for computation is MobileNet-SSD, this is chosen based on the experimental result conducted against Yolo framework.

The NCSDK includes a set of software tools to compile, profile, and validate DNNs as well as the Intel Movidius Neural Compute API known as NCAPI for application development in C/C++ or Python.

YOLO (You Only look once ) Model Framework

It is a heavy architecture which is based on bounding boxes it cannot be used for embedded vision applications. Its trained on Pascal VOC, which can detect up to twenty different classes. Architecture of YOLO Yolo architecture is more like FCNN (fully constitutional neural network) and passes the image (nxn) once through the FCNN and output is (mxm) prediction. This the architecture is splitting the input image in mxm grid and for each grid generation 2 bounding boxes and class probabilities for those bounding boxes. Bounding box is more likely to be larger than the grid itself.

Limitations of YOLO

YOLO imposes strong spatial constraints on bounding box predictions since each grid cell only predicts two boxes and can only have one class. This spatial constraint limits the number of nearby objects that our model can predict.
YOLO model struggles with small objects that appear in groups.
It treats errors the same in small bounding boxes versus large bounding boxes. A small error in a large box is generally benign but a small error in a small box has a much greater effect on IOU. Our main source of error is incorrect localizations.
YOLO is a heavy weight model 269.9MB, which is gives low recognition speed 2–3 fps and less accuracy.

Mobile Net-SSD Model Framework

Mobile Nets SSD (Single Shot Multibox Detection) is an Efficient convolution Neural Network architecture for mobile and embedded vision applications. MobileNets are based on a streamlined architecture that uses depth wise separable convolutions to build light weight deep neural networks. Mobile Net is an architecture which is more suitable for mobile and embedded based vision applications where there is lack of compute power. This architecture was proposed by Google. This architecture uses depth wise separable convolutions which significantly reduces the number of parameters when compared to the network with normal convolutions with the same depth in the networks. This results in light weight deep neural networks. The normal convolution is replaced by depth wise convolution followed by point wise convolution which is called as depth wise separable convolution. By using depth wise separable convolutions, there is some sacrifice of accuracy for low complexity deep neural network. Employing Single Shot Multi-Box Detection compensate that and improves accuracy as well.

Single Shot: this means that the tasks of object localization and classification are done in a single forward pass of the network
Multi-box: this is the name of a technique for bounding box regression developed by Szegedy et al.
Detector: The network is an object detector that also classifies those detected objects.

Block Diagram Of MobileNet-SSD Architecture.

Results MobileNet-SSD vs YOLO model Comparision.

When both the models are trained on a COCO dataset (330K Images, 80+ object) following results are obtained.

Thus for implementing on a low power device MobileNet-SSD is more suitable therefore, MobileNet-SSD framework has been installed on the Movidius Neural Computing Stick.

Now , when we setup the NCS along with the Raspberry Pi3 and Performed the same operations we obtained the following results below

The NCS can propel the Raspberry Pi to a ~6.88x speedup over the standard CPU object detection!

Object detection results on the Intel Movidius Neural Compute Stick (NCS) when compared to just the Raspberry Pi (CPU). The NCS helps the Raspberry Pi to achieve a ~6.88x speedup.

Thanks for Reading!!