Selecting and preparing a specific subset of images from the COCO dataset to train YOLO Object Detection Model

Published in

The Startup

5 min readApr 2, 2020

Introduction

In this blog, we will try to explore the COCO dataset, which is a benchmark dataset for object detection/image segmentation. The data we will use for this contains 117k images containing Objects belonging to 80 classes. I will also address how to get a specific subset of images relevant to the problem we are trying to solve. For example, transportation-related images for a self-driving car problem.

Detecting transportation-related objects, people

If we think about object detection, it’s a combination of Regression, classification. There are two tasks we should do in Object detection first one being drawing a bounding box to do that we have to get the coordinates of the bounding box/s, which is regression. After getting the bounding box/s, we have to classify the object in each bounding box, which is Classification.

So, for object detection, we have to predict the bounding box/s and their corresponding labels.

Libraries, Data requirements

We will use below-mentioned libraries to do the pre-processing and loading data.

gluoncv (dependencies: pycocotools, Visual C++ Redistributable for Visual Studio 2015)
mxnet
matplotlib
opencv
numpy
os
tqdm — to show the progress bar
matplotlib

Working with COCO Data

Downloading Data

Please download the data from the below-attached links:

http://images.cocodataset.org/zips/train2017.ziphttp://images.cocodataset.org/zips/val2017.ziphttp://images.cocodataset.org/annotations/annotations_trainval2017.ziphttp://images.cocodataset.org/annotations/stuff_annotations_trainval2017.zip

The train2017.zip file is 18GB large so it can take time based on the bandwidth of your network. Rest of the files are 1GB large. After downloading all the data unzip your files.

Loading Data

The below snippet will load the training, validation images, and names of the objects in the COCO dataset.

train_dataset = data.COCODetection(‘.’,splits=[‘instances_train2017’])
val_dataset = data.COCODetection(‘.’,splits=[‘instances_val2017’])
#Getting names of the objects present in COCO dataset.
names= pd.read_csv("./yolo_coco_data/coco.names")
print(‘Num of training images:’, len(train_dataset))
print(‘Num of validation images:’, len(val_dataset))

Visualizing single image

#Loading random image and it's respective label
train_image, train_label = train_dataset[1234]#Getting bounding boxes of objects present in the loaded image.
bounding_boxes = train_label[:, :4]# Getting Classes of objects present in the image.
class_ids = train_label[:, 4:5]
print(class_ids)#Visualizing the image with bounding boxes around the recognized objects
utils.viz.plot_bbox(train_image.asnumpy(), bounding_boxes, scores=None,
 labels=class_ids, class_names=train_dataset.classes)
plt.savefig(“example4.png”)

After running the above snippet, you will get an image named example.4.png. Please look at the below screenshot for the result. As you can see in the below image, each object is bounded with a box and labelled accordingly.

Distribution of objects in the COCO dataset

The below snippet will give the frequency count of each object.

Frequency count — In how many images each a particular object is present.

from tqdm import tqdm_notebook as tqdm
Id_counts={}
for k in range(80):
    Id_counts[names.values[k][0]]=0
for i in tqdm(range(len(train_dataset))):
    train_image, train_label = train_dataset[i]
    bounding_boxes = train_label[:, :4]
    class_ids = train_label[:, 4:5]
    for j in range(80):
        if j in class_ids:
            Id_counts[names.values[j][0]]+=1
print(Id_counts)

The below snippet will give you the bar graph of the frequencies.

import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
figure(num=None, figsize=(20, 10), dpi=80, facecolor=’w’, edgecolor=’k’)
plt.bar(range(80), Id_counts.values(), width= 0.8,color=’g’)
plt.xticks(range(80),Id_counts.keys() , rotation=90)
plt.show()

Object distribution in the COCO dataset:

Frequency distribution of Objects in the COCO dataset

As you can see in the above graph, the most common object in COCO dataset is Person with 60k+ image references.

We need images that will be relevant to the problem statement. For example, if we are trying to solve the self-driving car problem, we need to detect vehicles, people, traffic lights, roads, etc. So we have to select images of cars, bicycles, trucks, roads, traffic lights, people. In the next section, we will see how to get these specific set of images from the COCO dataset, how to pre-process images and bounding boxes for the YOLO algorithm.

Utility functions

These are the utility functions we will use for pre-processing bounding boxes, saving those boxes to respective files, and pre-processing heights, widths of each box for calculating custom anchor boxes in the next blog. I will explain what anchors are and how custom anchor boxes are useful in the next blog where we can dive deep into YOLO algorithm.

# This function is to pre-process bounding boxes.
def yolo_preprocessing(class_index, point1, point2, width, height):
 center_x = (point1[0]+point2[0])/(2*width)
 center_y = (point1[1]+point2[1])/(2*height)
 x_width = abs(point1[0]-point2[0])/width
 y_height = abs(point1[1]-point2[1])/height
 return str(class_index) + “ “ + str(center_x) \
 + “ “ + str(center_y) + “ “ + str(x_width) + “ “ + str(y_height), str(x_width) + “ “ + str(y_height)# This function is to save bounding boxes, anchors to respective files. 
def save_line(txt_path, line):
 with open(txt_path, ‘a’) as myfile:
 myfile.write(line + “\n”)

Selecting images related to transportation (cars, bicycles, people, traffic lights,.. etc.)

Since there are a lot of images, we can select a pre-selected percentage of images belonging to each object. In the below code, I am selecting images containing objects such as Cars, train, trucks, person, Bicycles, and traffic lights.

I am only getting images of Person when there are at least one of the other (Cars, train, trucks, Bicycles, traffic lights)objects present in the image. By doing this we can better train our YOLO model to recognize Transportation objects, people in an image. This task is vital for Self-driving car problem.

Before running this code, create 3 new folders in the current directory naming ‘images’, ‘bbox’, and ‘cluster’. After running this code successfully you will get all the relevant images in the images folder, respective bounding boxes for each image in the bbox folder, and all the heights, weights of all the bounding boxes in a single file in the cluster folder to get custom anchors.

import cv2
import os# We will keep retaining_percentage% total images containing required objects.
retaining_percentage=10# We will use Id_counts_mod to keep track count of images containing each object
Id_counts_mod={}
for k in range(80):
    Id_counts_mod[names.values[k][0]]=0
k=0
for i in tqdm(range(len(train_dataset))):
    train_image, train_label = train_dataset[i]
    bounding_boxes = train_label[:, :4]
    class_ids = train_label[:, 4:5]
    include = False
    #You can include/exclude any objects by changing below array.    for j in [0,1,2,3,5,7,9,11,12]:
        if j!=0:
            if j in class_ids:
                if(Id_counts_mod[names.values[j][0]]<Id_counts[names.values[j][0]]*(retaining_percentage/100)):
                    include=True
                    break
        else:
            if j in class_ids:
                if(Id_counts_mod[names.values[j][0]]<Id_counts[names.values[j][0]]*(retaining_percentage/100)):
                    for g in [1,2,3,5,7]:
                        if g in class_ids:
                            include=True
                            break
                    if(include):
                        break
                        
    imgFolder='images'
    txtFolder = 'bbox'
    cluster='cluster'
    txtPath = os.path.join(txtFolder, str(k))+'.txt'
    clustPath = os.path.join(cluster,'clustering')+'.txt'if(include):
        imgPath = os.path.join(imgFolder, str(k)) +'.jpg'
        image = train_image.asnumpy()
        height, width = image.shape[:2]
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        cv2.imwrite(imgPath,image)
        for j in range(80):
            if j in class_ids:
                Id_counts_mod[names.values[j][0]]+=1 
        for j in range(len(class_ids)):
            x1 = int(bounding_boxes[j][0])
            y1 = int(bounding_boxes[j][1])
            x2 = int(bounding_boxes[j][2])
            y2 = int(bounding_boxes[j][3])
            line = yolo_preprocessing(class_ids[j],(x1,y1),(x2,y2),width, height)
            line2 = clustering_data((x1,y1),(x2,y2),width, height)
            save_bb(clustPath,line2)
            save_bb(txtPath,line)
        k+=1

Conclusion

In the next blog, I will talk about how to create YOLOv3 model from scratch and train it using this custom data we got here from COCO dataset, problem-specific anchors.

Thanks for taking your valuable time to read this article, hope you learned something new from this. I highly appreciate the feedback. Happy reading.

Drink coffee and keep on learning