How I built a powerful object detector in under 100 lines of code.

Published in

Analytics Vidhya

5 min readFeb 17, 2020

Traffic image with various objects detected using YOLOv3

Object Detection has been one of the hottest topic in the field of computer science. Ever since AlexNet took the ImageNet Large Scale Visual Recognition Challenge by storm, achieving a top-5 error of 15.3%, more than 10.8 percentage points lower than the runner up, the use of Convolutional Neural Network and Deep Learning has been on the rise for various computer vision tasks. Various different architectures have been proposed and various different approaches have been taken. The R-CNN model family, SSDs, YOLO etc are the most used object detection architectures currently.

The object detection frameworks can mainly be divided into two sections: One shot detectors and Multiple shot detectors. Single Shot Detection architectures like YOLO or SSD provide a better inference speed while multiple shot detectors boast a higher accuracy than one shot detectors. However, some new one shot detection architectures (like RetinaNet from FAIR) claim to be just as good as the two shot detectors like Faster RCNN.

Object detection is one of the most widely researched topic worldwide and the algorithms used for this purpose is only going to improve with every passing day. Today, we’ll learn about how to use a pretrained YOLO model to create our very own general object detector.

YOLOv3 has been trained on the Common Objects in Context (COCO) dataset and is able to recognize objects of 80 different classes.

For those of you who’d like to read in detail about the YOLOv3 architecture, here’s a link to the original YOLOv3 paper.

https://pjreddie.com/media/files/papers/YOLOv3.pdf

Here we’ll be using the opencv’s dnn module inference on the pretrained YOLOv3 model to create our very own object detector.

Before we begin, head on to https://pjreddie.com/darknet/yolo/
and download the pretrained weights and configuration files. Seven different version of YOLO models are hosted in the site, each one with their own pros and cons. Choose whichever one you feel like using. For my purpose, I am using the YOLOv3-spp, with a mAP of 60.6 (arguably the best YOLOv3 model hosted in the site).

Then head on to https://drive.google.com/open?id=1aDjBUD0PN-N1GdRX-QQD-4gyGRx05gw2 and download the coco.names file and put it in the project directory as well. This file contains the class names which YOLOv3 is able to perform detection on.

Now that the YOLOv3-spp.weights,YOLOv3-spp.cfg and coco.names is in the project directory, its time to do some short and efficient coding.

Importing the libraries

We will be using the OpenCV library to perform image processing and the ever so efficient Numpy for performing array operations. We’ll use the argparse library to parse the command line arguments.

import cv2
import numpy as np
import arparse

Loading Image Classes

Then we’ll define a short function to load the different classes in coco.names file.

Where the MAGIC happens

We now have the helper function load_classes locked and loaded. Let’s write the main run function which will drive the object detection process.

First we’ll use the dnn module of OpenCV to read the darknet architecture using the weight and the cfg file we downloaded earlier.

net = cv2.dnn.readNet(opt.weights, opt.cfg)

opt is our argument parser object which we’ll discuss about later.

Then we’ll get the class names using our helper function load_classes. It takes the path for the *.names file as its only argument which will also be provided through command line.

classes = load_classes(opt.names)

Now, we’ll retrieve the output layer names from our predefined darknet architecture.

layer_names = net.getLayerNames()
outputlayers = [layer_names[i[0]-1] for i in net.getUnconnectedOutLayers()]

We’ll also define the colors and the font, which we’ll later write in our input image.

colors = np.random.uniform(0,255, size = len(classes),3))
font = cv2.FONT_HERSHEY_SIMPLEX

Now its time to initialize the VideoCapture object. It will be used to capture video from a stream. I will be using the video stream from webcam, feel free to use any videos by passing its respective path. See VideoCapture documentation for more information on how to take various video streams as input.

cap = cv2.VideoCapture(0)

Now we’ll initiate a while loop. Each iteration of the loop will take a frame, perform detection on it and show it in the output screen.

First we’ll read a frame from the VideoCapture object, generate blobs from it and then pass it to the darknet architecture.

_,frame = cap.read()
height,width, channels = frame.shape
blob = cv2.dnn.blobFromImage(frame,0.00392,(320,320),(0,0,0),True,crop=False)
net.setInput(blob)
outs = net.forward(outputlayers)

Now, we’ll get confidence score of algorithm in detecting object in a blob. The class_id is then generated using the highest confidence for the given object blob. which is appended into a list. For that, we’ll initiate three different lists, class_ids, confidences and boxes, each one holding information as their name suggests.

We’ll then perform non-max suppression to filter out multiple detection for the same object. It can be done using the NMSBoxes function of the dnn library.

indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.4,0.6)

Now its time to take all the detected objects, draw appropriate bounding boxes and confidence values around them, draw the class label over the bounding box and then write the subsequent frame to the output screen.

Argument Parser

Our object detection algorithm is all but ready to go. Now we just need to define the argument parser for taking command line inputs and call the detection algorithm with appropriate path for the weight file, cfg file and the *.names file and we’ll be on our way.

parser = argparse.ArgumentParser()
parser.add_argument(‘ — weights’,type = str,default = ‘./yolov3-spp.weights’,help = ‘path to the weights file’)
parser.add_argument(‘ — cfg’, type = str, default = ‘./yolov3-spp.cfg’, help = ‘path to the cfg file’)
parser.add_argument(‘ — names’, type = str, default = ‘./coco.names’, help = ‘*.names path’)
opt= parser.parse_args()

If the files are kept in the project directory itself (./*), we don’t need to pass command line inputs as the path values are already defined as the default values. If the path is changed, we’ll need to explicitly provide the paths using appropriate arguments from command line.

The FINAL Code

The full implementation of the code is provided below:

As promised, we just built a very powerful general object detector in under 100 lines of code (84 to be precise 😉). The given implementation is only able to perform inference using your CPU so there might be some performance related issues. We’ll need to use deep learning frameworks to harness the power of our GPUs and speed up the process. More on that later…..