SSD Object Detection in Real Time (Deep Learning and Caffe)

Published in

ACM JUIT

4 min readJun 13, 2020

In this article, we will be talking about SSD Object Detection- features, advantages, drawbacks, and implement MobileNet SSD model with Caffe — using OpenCV in Python.

What is Object Detection?

Object Detection in Computer Vision is as simple as it sounds- detecting and predicting objects and localizing their area. Object Detection is based on image classification. Irrespective of the latter being performed using neural networks or primitive classifiers, image classification is always the first step. Building further on this, we can perform detection which localizes all possible objects in a given frame.

Single Shot MultiBox Detector (SSD)

SSD Object Detection extracts feature map using a base deep learning network, which are CNN based classifiers, and applies convolution filters to finally detect objects. Our implementation uses MobileNet as the base network (others might include- VGGNet, ResNet, DenseNet).

For further in-depth and an elaborate detail of how SSD Object Detection works refer to this Medium article by Jonathan Hui.

What is Caffe?

Caffe is a deep learning framework developed by Berkeley AI Research and community contributors. Caffe was developed as a faster and far more efficient alternative to other frameworks to perform object detection. Caffe can process 60 million images per day with a single NVIDIA K-40 GPU. That is 1 ms/image for inference and 4 ms/image for learning.

Do check out the Caffe GitHub and Caffe Website.

Code Implementation

Requirements

Python (ver 3.6) and OpenCV (ver 4.2)
Caffe MobileNet SSD model weights and prototxt definition here.

Directory Tree

Create a folder named Caffe and save model weights and prototxt file
Create a python script file detectDNN.py

Importing libraries (Lines 1–8)

Constructing argument parsing (Lines 11–16)

Initializing labels and colors with object names and assigning random color to each label (Lines 19–23)

Loading the MobileNet SSD model and prototxt definition to deploy the weights and initializing video stream (Lines 25–33)

Reading input frames, resizing and extracting dimensions of frame (Lines 39–41)

Converting frame to blob and passing through Caffe model. detections = nn.forward() stores output layer of neural network (Lines 43–49)

Looping over each detection. Storing confidence- prediction percentage of each object corresponding to each label. Filtering out weak detections and storing index ID of each object (Line 51–59)

The next lines of code extract the localized coordinates of each object. Drawing bounding box over detected object along with label and confidence percentage (Line 61–74)

Displaying live streaming with detections and bounding boxes and an escape command. Finally, receiving FPS information and cleaning up (Lines 76–90)

To execute code, run the following command in your project directory on the terminal

Complete source code on- GitHub.

What are the drawbacks of Single Shot MultiBox Detector?

SSD Framework though faster than other similar alternatives, finds trouble while detecting smaller objects (still performing better than YOLO).

What alternative object detection frameworks can be used?

Apart from SSD, there are other frameworks which can be implemented in object detection, the more popular ones being YOLO and Fast/Faster-R CNN. The three have their own set of pros and cons, however the SSD method has been found to be the fastest and most efficient among these. To learn more about YOLO and its various versions read here.

Resources

Object Detection through Deep Learning by Adrian Rosebrock, here.