Object Detection using YOLOv3

Pratik Patil
Analytics Vidhya
Published in
3 min readMay 27, 2020

Detect objects using YOLOv3 using COCO Dataset.

What is YOLO?

YOLO — You Only Look Once — is an extremely fast multi object detection algorithm which uses convolutional neural network (CNN) to detect and identify objects.

The neural network has this network architecture.

Source:- Source: https://arxiv.org/pdf/1506.02640.pdf

What is COCO Dataset?

Common Objects in Context (COCO) Common Objects in Context (COCO) is a database that aims to enable future research for object detection, instance segmentation, image captioning, and person keypoints localization.COCO is a large-scale object detection, segmentation, and captioning dataset.

Samples from COCO Dataset.

# Workflow:-

1) Reading input video

2) Loading YOLO v3 Network

3)Reading frames in the loop

4) Getting blob from the frame

5)Implementing Forward Pass

6) Getting Bounding Boxes

7) Non-maximum Suppression

8) Drawing Bounding Boxes with Labels

9)Writing processed frames

# Result:
New video file with Detected Objects, Bounding Boxes and Labels

Step 1: Importing Libraries and Setting path

Will will import the video in which the objects and labels are to be recognized using the VideoCapture function in cv2.

Step 2 : Load YOLOv3 Model:-

We’ll Need to load the YOLOv3 Model with weights and configuration files from here. You can download the coco dataset names file from here and Set the path correctly.

Step 3: Read Frames

We read the frame from the video file one by one.

step 4: Getting blobs

A blob is a 4D numpy array object (images, channels, width, height).It has the following parameters:

  • the image to transform
  • the scale factor (1/255 to scale the pixel values to [0..1])
  • the size, here a 416x416 square image
  • the mean value (default=0)
  • the option swapBR=True (since OpenCV uses BGR)

Step 5: Implementing Forward Pass

Pass each Blob through the network.

Step 6: Getting Bounding Boxes:-

Here we get the bounding Boxes.

Step 7:-Non-Maximum Supression.

The neighbourhood windows have similar scores to some extent and are considered as candidate regions. This leads to hundreds of proposals. As the proposal generation method should have high recall, we keep loose constraints in this stage. However processing these many proposals all through the classification network is cumbersome. This leads to a technique which filters the proposals based on some criteria called Non-maximum Suppression.

Step 8: Drawing of Bounding Boxes:-

We Draw bounding boxes for each of the objects detected in the frame. We use the CV2.rectangle function to draw.

Step 9: Writing processed Frames in File:-

In the last step we write the proposed bounding boxes and label in the video frame and save it.

Full Code at:-

Output Labelled Video:-

--

--

Pratik Patil
Analytics Vidhya

Machine learning Engineer | Artificial Intelligence Enthusiast | Deep learning fanatic https://www.linkedin.com/in/patilpratik13/