Object Detection using Single Shot MultiBox Detection (SSD) and Deep Neural Network (DNN)

Hemanthhari2000
featurepreneur
Published in
6 min readApr 3, 2021

Understand about Single Shot MultiBox Detector and OpenCV’s Deep Neural Network

Photo by Max Bender on Unsplash

Introduction

Object detection is so important in the world right now as it is used in many fields like Healthcare, Agriculture, Autonomous Driving, and more. It provides an efficient way of handling image classification by detecting the object in the image and letting us know where it is in the image using localization, That is, it creates a bounding box around the object. This may sound like just another image classification algorithm but it is super powerful in the current world. Self Driving Cars use object detection (a much-advanced version, obviously! ) to detect what is there in front of them. They are used in healthcare to understand and classify different types of tumors and diseases in the human body.

The Applications of Object Detection are endless. But what makes it more interesting is to be able to achieve such technology in real-time. This has been very challenging so far. By using a simple technique we can boost the performance of object detection in real-time drastically. This can be observed by an increase in FPS (Frames Per Second) and its faster processing of each frame. We will exactly discuss this methodology in this article by using OpenCV’s Deep Neural Network (or) simply called DNNs.

Overview

Let’s take a look at the contents that we will cover in this article.

  • What is Object Detection?
  • What is Single Shot MultiBox Detector?
  • What is OpenCV’s Deep Neural Network (DNNs)
  • Prerequisites
  • Implementation of SSD Object Detection using DNNs
  • Conclusion

What is Object Detection?

Object detection is a computer technology related to computer vision and image processing that deals with detecting instances of semantic objects of a certain class (such as humans, buildings, or cars) in digital images and videos. Well-researched domains of object detection include face detection and pedestrian detection. Object detection has applications in many areas of computer vision, including image retrieval and video surveillance.

They are extensively used in computer vision tasks like image annotations, face recognition, face detection, object tracking, and many more. Not only that but they are also used to solve real-world problems like Automated CCTV surveillance where you can keep track of some objects that may cause some threats. Objects are tracked until any threat or abnormality is found. It can then be handled by recording the footage for a certain duration and alerting the respective authorities about it.

There are many methods to achieve Object Detection. Some of the methods used to achieve object detection are

  • Single Shot MultiBox Detector (SSD)
  • Faster R-CNN
  • Histogram of Oriented Gradients (HOG)
  • Region-based Convolutional Neural Network (R-CNN)
  • YOLO (You Only Look Once)

Some of these methods may be familiar to you. Out of all, we understand about SSD and implement it using OpenCV’s DNNs

What is Single Shot MultiBox Detector?

Single Shot MultiBox Detector is a deep learning model used to detect objects in an image or from a video source. Single Shot Detector is a simple approach to solve the problem but it is very effective till now. SSD has two components and they are the Backbone Model and the SSD Head. Backbone Model is a pre-trained image classification network as a feature extractor. Usually, the fully connected classification layer is removed from the model. SSD Head is another set of convolutional layers added to this backbone and the outputs are interpreted as the bounding boxes and classes of objects in the spatial location of the final layer's activations.

Instead of using a traditional sliding window algorithm, SSD divides the image as grids, and each grid cell responsible for detecting objects in that region of the image. If there is no object detected then we output it as nothing or to be more precise we will put a “0” indicating that there is no object found.

What if there are many objects of the same instance in a single image. This is where Anchor Box comes into play. Anchor Boxes are simple boxes that are assigned with multiple anchors/prior boxes, which are predefined and have fixed size and shape within the grid cell. Based on this we are able to detect multiple objects in an image.

What is OpenCV’s Deep Neural Network (DNNs)?

OpenCV's Deep Neural Network (DNNs)is a module that can be used to train and test deep learning models. There is a framework that is used to train the model that is Caffe. We can even train the DNNs using just our CPUs or GPUs. Using just a CPU gives a pretty decent performance.

Prerequisites

All you need to run a simple Object Detection using SSD are:

  • Python (Obviously)
  • OpenCV
  • Download the MobileNetSSD prototxt from here.
  • Download the MobileNetSSD Caffe model from here.

Our Video looks like this

That’s it, yeah that's pretty much it. Now, let's get started implementing our object detection, shall we?

Implementation of SSD Object Detection using DNNs

Go ahead and open your project folder and create a new folder saying object detection. Create a file called ssd-object-detection.py .

  • Imports
import numpy as np
import cv2

That’s all for our imports. Now, let's start defining some constants.

PROTOTXT = "MobileNetSSD_deploy.prototxt"
MODEL = "MobileNetSSD_deploy.caffemodel"
INP_VIDEO_PATH = 'cars.mp4'
OUT_VIDEO_PATH = 'cars_detection.mp4'
GPU_SUPPORT = 0
CLASSES = ["background", "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]COLORS = np.random.uniform(0, 255, size=(len(CLASSES), 3))

Now let’s load the model. If you have CUDA GPU support for DNNs then go ahead and set the backend for it.

net = cv2.dnn.readNetFromCaffe(PROTOTXT, MODEL)
if GPU_SUPPORT:
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)

Now, open your video capture and start reading each frame. Then set your model and start using the pre-trained model to detect objects from frames.

while True:
ret, frame = cap.read()
if not ret:
break
h, w = frame.shape[:2]
blob = cv2.dnn.blobFromImage(frame, 0.007843, (300, 300), 127.5)
net.setInput(blob)
detections = net.forward()
for i in np.arange(0, detections.shape[2]):
confidence = detections[0, 0, i, 2]
if confidence > 0.5:
idx = int(detections[0, 0, i, 1])
box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
(startX, startY, endX, endY) = box.astype("int")
label = "{}: {:.2f}%".format(CLASSES[idx],confidence*100)
cv2.rectangle(frame, (startX, startY), (endX, endY), COLORS[idx], 2)
y = startY - 15 if startY - 15 > 15 else startY + 15
cv2.putText(frame, label, (startX, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, COLORS[idx], 2)

That’s it we are done with the object detection. It is as simple as that. Now after all this your code should look this.

Just run the code. if everything is perfect then your output should look something like this.

That’s it. We implemented object detection with such good fps and accuracy. You can use it with CUDA DNNs installed and by doing that you will see a lot of improvement in your model and you will easily get much higher accuracy.

Conclusion

In this article, we have seen object detection using the SSD model and OpenCVs DNNs. We understood how exactly SSD works with OpenCVs DNNs. We also implemented a simple object detection using a pre-trained model with much higher accuracies. I hope this article was useful to you all. Will see you in my next article until then, as always code learn repeat …….

Follow for more…

--

--