Object Detection with OpenCV-Python using YOLOv3

Darshan Adakane
Analytics Vidhya
Published in
6 min readOct 19, 2019
Car,Motorbike and person detected using YOLOv3 algorithm

Greetings everyone.

Object detection is becoming an fascinating field of application and research in Computer Vision. Thanks to the faster computing power and advanced algorithms, we are making computers understand like humans by using images and videos.

In this article we will see how using OpenCV and Python, we can detect object in a still picture by applying the most popular YOLO(You Look Only Once) algorithm.

I am assuming :

You have some knowledge of Python and familiar with IDE — Jupyter notebook. You have understanding how Convolutional Neural Network (CNN) works (I would recommend Prof. Andrew Ng’s course on CNN on Coursera for learning)

Couple of clarifications:

OpenCV is the computer vision library/ framework that we we will be using to support our YOLOv3 algorithm. OpenCV has inbuilt support for Darknet Architecture.

Darknet Architecture is pre-trained model for classifying 80 different classes. Our goal now is that we will use Darknet(YOLOv3) in OpenCV to classify objects using Python language.

Lets begin.

YOLO is an object detection algorithm (Check out the paper came out it 2015 here). The recent YOLOv3 is more powerful than basic YOLO and YOLOv2 and faster than previous algorithms like R-CNN and more accurate too. The core reason being convolutional implementation of the layers meaning it scans the image (or frame) only once to make prediction as compared to other algorithm which requires multiple scans. (Power of Convolution. Yay!)

Set up:

We need 3 main files

  1. yolo.cfg (Download from here) — Configuration file
  2. yolo.weights (Download from here) — pre-trained weights
  3. coco.names (Download from here)- 80 classes names

Lets begin Coding :

Here I am using Jupyter notebook.

First step is to import cv2 and numpy libraries. Then we load yolo v3 algorithm using cv2.dnn.readNet by passing weights and cfg file. Then we will load all classes names in array using coco.names file.

Next we will define output layers because that’s where we will be defining what object is detected by using net.getUnconnectedOutLayers and net.getLayerNames.

Next let us load an image. We will reduce the height and width of our image to scale of 40% and 30%. And save all those values in height,width,channels variables for theoriginal image.

To view the image, use following code but remember keep below code always at the end of the file. Any code that goes will always be above the below three lines.

Original Image

This is our original image from which we want to detect as many objects as possible. But we cannot give this image directly to algorithm.So we need to do some conversion from this image. This is called blob conversion which is basically extracting features from image.

We will detect objects in blob by using cv2.dnn.blobFromImage and passing few variables: img is file name, scalefactor of 0.00392, size of image to be used in blob be (416,416), no mean subtraction from layers as (0,0,0), setting True flag means we will be inverting blue with red since OpenCV uses BGR but we have channels in image as RGB.

Now lets see how the 3 different blobs looks like by using following code. We don’t observe much difference but this is what we will input to YOLO algorithm.

We now pass this blob to network using net.setInput(blob) and then forward this to the outputlayers. Here all objects have been detected and outs contains all the information we need to instruct to extract the position of the object like top,left,right,bottom positions,name of class.

Now lets evaluate outs by showing information on screen. Mainly we will be trying to predict the confidence meaning how confident algorithm is when it predict some object. For this, we will loop through outs, first get all the scores for each out in outs. Then get the class_id that has highest score amongst them and then assign confidence to value of scores by passing class_id.

Now we will assign the confidence level threshold as 0.5. Anything above 0.5 should mean object detected. Let also have the center_x,center_y, w as width, h as height of the object detected. Here we will us height,width variables we saved previously of original image. We will also draw a circle of thickness 2 at center of the object just for the sake of proof that object has been detected.

Further lets draw rectangle around detected object by using center_x,center_y,w,h. And append some information to that like class,confidence.

Running above code gives following output.

There might be cases that multiple time the same object might be detected like below. (In my above test image, multiple objects are not detected so i am showing that scenario using a different image as below). You see two boxes are detected each for laptop and monitor. We want to eliminate this.

To eliminate this, we will use Non-Max Suppression(NMS) functionality. What this will do is eliminate the boxes by using some threshold value(any box having value less than 0.6- that will be removed) and it determines that keep only the best of all boxes. And indexes variable will keep track of such unique objects detected. So no multiple detection of same objects.

Now using below loop over all found boxes, if box is appearing in indexes then only draw rectangle, color it, put text of class name on it.

Final output will look something like this.

So we have detected car, person and motorbike from test image.

Whoa! That was a lot to take i guess.

But I hope this would help in implementing yolov3 algorithm. I have tried to make it understandable from beginners mindset. Please let me know if you have any questions,comments.

Convolution Rocks!

Happy Learning !

Thanks.

[You can find complete code on Github . Star if you like it. Thanks]

(In the next article I will show how we can detect object in real time using webcam and using the YOLOv3 algorithm)

--

--