An introduction to implementing the YOLO algorithm for multi object detection in images

Ann Mohan Kunnath
4 min readAug 4, 2020

An introduction to implementing the YOLO algorithm for multi object detection in images

YOLO is an extremely fast real time multi object detection algorithm. YOLO stands for “You Only Look Once”. This is the link to the original paper : https://pjreddie.com/media/files/papers/YOLOv3.pdf.

The algorithm applies a neural network to an entire image. The network divides the image into an S x S grid and comes up with bounding boxes, which are boxes drawn around images and predicted probabilities for each of these regions.

The method used to come up with these probabilities is logistic regression. The bounding boxes are weighted by the associated probabilities. For class prediction, independent logistic classifiers are used.

In this article, I am going to demonstrate how to implement the YOLO algorithm with a pre trained model.

First, we would need to install DarkNet. DarkNet is a neural network framework that is open source. You can find more information about DarkNet in this link: https://pjreddie.com/darknet/

Step 1: We import the necessary libraries

import cv2 # computer vision library
import matplotlib.pyplot as plt # to plot
from darknet import Darknet # to use DarkNet

Step 2: We load the configuration file and pre trained weights into variables

config_file = './cfg/yolov3.cfg'
pretrained_weights = './weights/yolov3.weights'

Step 3: We instantiate an object of the DarkNet class

net = Darknet(config_file)

Step 4: We load the pre trained weights

net.load_weights(pretrained_weights)

Step 5: We display the network and see how it looks

net.print_network()

A small part of the output is shown below:

layer     filters    size              input                output
0 conv 32 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 32
1 conv 64 3 x 3 / 2 416 x 416 x 32 -> 208 x 208 x 64
2 conv 32 1 x 1 / 1 208 x 208 x 64 -> 208 x 208 x 32
3 conv 64 3 x 3 / 1 208 x 208 x 32 -> 208 x 208 x 64

In the full output,we can see that the network has 106 layers including the classification / detection layers.

Step 6 : Next, we read in and image and display it. This is the image we are going to apply the YOLO algorithm on.

plt.rcParams['figure.figsize'] = (15.0, 15.0)
img = cv2.imread('./images/city_scene.jpg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.imshow(img)

The output is as follows:

Step 7: Next we load the names that the pre trained model was trained on.

class_names_file = 'data/coco.names'
class_names = load_class_names(class_names_file)

Now, the variable class_names holds all the class names that the model was trained on. Let us display the values in class_names to get a better idea of what it contains

Step 8: Display the class names

print(class_names)

The output is as follows:

['person', 'bicycle', 'car', 'motorbike', 'aeroplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench','bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'sofa', 'pottedplant', 'bed', 'diningtable', 'toilet', 'tvmonitor', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']

Step 9: The image is resized to have the same width and height as the first layer of the neural network

resized_image = cv2.resize(img, (net.width, net.height))

resized_image = cv2.resize(img, (net.width, net.height))

Step 10: We set the Intersection Over Union (IOU) threshold. IOU is defined as Area Of Overlap / Area Of Union of the ground truth bounding box and the predicted bounding box.

iou_threshold = 0.4

This means that a detection with a IOU greater than .4 is a true positive.

Step 11: We set the Non-Maximal Suppression (NMS) threshold. NMS suppresses overlapping bounding boxes and only retains the bounding box that has the maximum probability of object detection associated with it.

nms_threshold = 0.6

Step 12: Next, we detect the objects and display them with their probabilities.

boxes = detect_objects(net, resized_image, iou_threshold, nms_threshold)
print_objects(boxes, class_names)
plot_boxes(img, boxes, class_names, plot_labels = True)

The output is as follows:

It took 3.684 seconds to detect the objects in the image.Number of Objects Detected: 28 Objects Found and Confidence Level:1. person: 0.999996
2. person: 1.000000
3. car: 0.707237
4. truck: 0.933031
5. car: 0.658086
6. truck: 0.666982
7. person: 1.000000
8. traffic light: 1.000000
9. person: 1.000000
10. car: 0.997369
11. bus: 0.998023
12. person: 1.000000
13. person: 1.000000
14. person: 1.000000
15. person: 1.000000
16. person: 1.000000
17. traffic light: 1.000000
18. traffic light: 1.000000
19. umbrella: 0.997282
20. traffic light: 1.000000
21. car: 0.989741
22. traffic light: 1.000000
23. traffic light: 0.999999
24. person: 0.999999
25. truck: 0.715035
26. traffic light: 1.000000
27. person: 0.999993
28. person: 0.999996

We can see the objects that have been detected along with their probabilities in the output above.

References:

  1. The Original Paper on YOLOv3, https://pjreddie.com/media/files/papers/YOLOv3.pdf
  2. Udacity Computer Vision Nanodegree, https://www.udacity.com/course/computer-vision-nanodegree--nd891
  3. Real Time Object Detection with YOLO, YOLOv2 and now YOLOv3, https://medium.com/@jonathan_hui/real-time-object-detection-with-yolo-yolov2-28b1b93e2088

--

--

Ann Mohan Kunnath

AI/Deep Learning/Machine Learning Enthusiast. MS Business Analytics Graduate Student at University Of Cincinnati .https://www.linkedin.com/in/annmohankunnath/