How to train an Object Detector with your own COCO dataset in PyTorch (Common Objects in Context format)
Understanding the Dataset & DataLoader in PyTorch
Update on 9-Apr-2020
- I have created a very simple example on Github. Please take a look at the link.
- I had an opportunity to present regarding Faster R-CNN. The slides can be found here. Note, I adapted figures from multiple sources (inc. textbooks, blog posts, etc); the original material can be found from links on the slides.
Background
PyTorch has multiple well known Computer Vision models built-in, which can readily be used for transfer learning as well as training your own models. There are many examples and official tutorials, e.g.
After some surveys, I thought
“Yes! This tutorial explains it very well, and the implementation might be straight-forward; I can run some models for my dataset!”
However, it was not as easy as I thought. Although it is straight-forward to use built-in datasets, it is not so easy to use your own custom data. PyTorch has the DataLoader
and Dataset
classes used in all their examples. The question was “How do I modify it for my data?”
I have worked with Python for a while now, however was new to PyTorch.
Problem statement: Most datasets for object detection are in COCO format. My training dataset was also COCO format. However, the official tutorial does not explicitly mention the use of COCO format.
This article summarises some findings towards “How to use your own COCO dataset in PyTorch.”
Dataset class
What is the Dataset class?
The Dataset
class enables you to generate (or pull) your data using multiple cores, and to feed the generated data to the model. In short, it is an efficient data generation utility.
Do we need to use the Dataset class?
Well, we are able to run deep learning models without the Dataset
class; but loading a dataset is generally memory-intensive — so it’s highly recommended to use the Dataset
class.
How do we use Dataset class?
Need to define following:
__init__()
: How to initialise the class__getitem__()
: How to generate samples from the data (What kind of data you want)- (Optional)
__len__()
: The total number of samples.
Example
- I have 3 jpeg images in folder
my_data
- The names of image are
img1.jpg
,img2.jpg
, andimg3.jpg
; - The labels of image are
[0, 1, 1]
(e.g.img1.jpg
is a black cat image, whereas bothimg2.jpg
andimg3.jpg
are tabby cat images); - I would like to efficiently load the image and label using the
Dataset
class.
What we need to do is: open the image file and fetch the label in __getitem__()
, returning both. Note, theDataset
must return tensors.
Let’s check the result. We can use the DataLoader
class to load my own dataset and plot the images.
This is a toy example of creating and loading your own Dataset
class. An excellent article regarding Dataset
can be found on here.
Step-by-step solution for COCO data
Tasks
- I followed the tutorial linked above. I needed to download
pycocotools
, which needed the C compiler — Install Cython; - As the official tutorial mentioned (also seen the above simplified example), the PyTorch data loading utility is the
torch.utils.data.DataLoader
class. It represents a Python iterable over a dataset. For my dataset, I needed to create my ownDataset
class,torch.utils.data.Dataset;
- The example of COCO format can be found in this great post;
- I wanted to implement Faster R-CNN model for object detection.
Modify Dataset class for COCO data
First, as the official documentation mentioned, I needed to overwrite __getitem__()
, to fetch a data sample for a given key. Also, subclasses could optionally overwrite __len__()
.
With pycocotools
, I created my own Dataset
class to
- Load annotation files
- Opening the corresponding image files
There are some ideas to highlight:
- In COCO format, the bounding box is given as
[xmin, ymin, width, height]
; however, Faster R-CNN in PyTorch expects the bounding box as[xmin, ymin, xmax, ymax]
. - In the above tutorial, they implemented Mask R-CNN — which needs “mask” information for
my_annotation
. It is not required for Faster R-CNN. - The inputs for a PyTorch model must be in tensor format. I defined
get_transform()
as below.
Setup own DataLoader
Once I had created my own Dataset
class, it was time to set up a DataLoader
.
Check DataLoader
Let’s check whether our DataLoader
pulls images and annotations iteratively.
The output was given as:
Run the model
Now we have prepared our own COCO-formatted data, ready for the Faster R-CNN model. It is straight forward to modify a few parameters in order to customise the model (e.g. number of anchor boxes, etc.). A simplified implementation using some lines from the official tutorial is presented below:
Conclusion
This article covered how to prepare your own COCO dataset, for use with an object detection model in PyTorch. During the exercise, I concluded that PyTorch is less complicated than other deep neural networks frameworks, especially for Computer Vision tasks. Having said that, I could not grasp the idea of the Dataset
and DataLoader
classes at the beginning and hopefully this article helps you develop some intuition!