How to train an Object Detector with your own COCO dataset in PyTorch (Common Objects in Context format)

Understanding the Dataset & DataLoader in PyTorch

Takashi Nakamura, PhD
FullStackAI
4 min readNov 5, 2019

--

Update on 9-Apr-2020

  1. I have created a very simple example on Github. Please take a look at the link.
  2. I had an opportunity to present regarding Faster R-CNN. The slides can be found here. Note, I adapted figures from multiple sources (inc. textbooks, blog posts, etc); the original material can be found from links on the slides.

Background

PyTorch has multiple well known Computer Vision models built-in, which can readily be used for transfer learning as well as training your own models. There are many examples and official tutorials, e.g.

After some surveys, I thought

“Yes! This tutorial explains it very well, and the implementation might be straight-forward; I can run some models for my dataset!”

However, it was not as easy as I thought. Although it is straight-forward to use built-in datasets, it is not so easy to use your own custom data. PyTorch has the DataLoader and Dataset classes used in all their examples. The question was “How do I modify it for my data?”

I have worked with Python for a while now, however was new to PyTorch.

Problem statement: Most datasets for object detection are in COCO format. My training dataset was also COCO format. However, the official tutorial does not explicitly mention the use of COCO format.

This article summarises some findings towards “How to use your own COCO dataset in PyTorch.”

Dataset class

What is the Dataset class?

The Dataset class enables you to generate (or pull) your data using multiple cores, and to feed the generated data to the model. In short, it is an efficient data generation utility.

Do we need to use the Dataset class?

Well, we are able to run deep learning models without the Dataset class; but loading a dataset is generally memory-intensive — so it’s highly recommended to use the Dataset class.

How do we use Dataset class?

Need to define following:

  1. __init__(): How to initialise the class
  2. __getitem__(): How to generate samples from the data (What kind of data you want)
  3. (Optional) __len__(): The total number of samples.

Example

  1. I have 3 jpeg images in folder my_data
  2. The names of image are img1.jpg, img2.jpg, and img3.jpg;
  3. The labels of image are [0, 1, 1] (e.g. img1.jpg is a black cat image, whereas both img2.jpg and img3.jpg are tabby cat images);
  4. I would like to efficiently load the image and label using theDataset class.

What we need to do is: open the image file and fetch the label in __getitem__(), returning both. Note, theDataset must return tensors.

Simple Dataset Class for returning images and labels
Usage of the simple Dataset Class

Let’s check the result. We can use the DataLoader class to load my own dataset and plot the images.

Loading your simple Dataset and visualising the results

This is a toy example of creating and loading your own Dataset class. An excellent article regarding Dataset can be found on here.

Step-by-step solution for COCO data

Tasks

  1. I followed the tutorial linked above. I needed to download pycocotools, which needed the C compiler — Install Cython;
  2. As the official tutorial mentioned (also seen the above simplified example), the PyTorch data loading utility is the torch.utils.data.DataLoader class. It represents a Python iterable over a dataset. For my dataset, I needed to create my own Dataset class, torch.utils.data.Dataset;
  3. The example of COCO format can be found in this great post;
  4. I wanted to implement Faster R-CNN model for object detection.

Modify Dataset class for COCO data

First, as the official documentation mentioned, I needed to overwrite __getitem__(), to fetch a data sample for a given key. Also, subclasses could optionally overwrite __len__() .

With pycocotools, I created my own Dataset class to

  1. Load annotation files
  2. Opening the corresponding image files
Example COCO Dataset class

There are some ideas to highlight:

  1. In COCO format, the bounding box is given as [xmin, ymin, width, height]; however, Faster R-CNN in PyTorch expects the bounding box as [xmin, ymin, xmax, ymax].
  2. In the above tutorial, they implemented Mask R-CNN — which needs “mask” information for my_annotation. It is not required for Faster R-CNN.
  3. The inputs for a PyTorch model must be in tensor format. I defined get_transform() as below.

Setup own DataLoader

Once I had created my own Dataset class, it was time to set up a DataLoader.

Check DataLoader

Let’s check whether our DataLoader pulls images and annotations iteratively.

The output was given as:

Run the model

Now we have prepared our own COCO-formatted data, ready for the Faster R-CNN model. It is straight forward to modify a few parameters in order to customise the model (e.g. number of anchor boxes, etc.). A simplified implementation using some lines from the official tutorial is presented below:

Conclusion

This article covered how to prepare your own COCO dataset, for use with an object detection model in PyTorch. During the exercise, I concluded that PyTorch is less complicated than other deep neural networks frameworks, especially for Computer Vision tasks. Having said that, I could not grasp the idea of the Dataset and DataLoader classes at the beginning and hopefully this article helps you develop some intuition!

--

--