Detecto — An object detection library for PyTorch

Simplifying the process of creating custom-trained object detection models

Published in

PyTorch

5 min readApr 17, 2020

Detecto is a Python library built on top of PyTorch that simplifies the process of building object detection models. The library acts as a lightweight package that reduces the amount of code needed to initialize models, apply transfer learning on custom datasets, and run inference on images and videos.

alankbi/detecto

Detecto is a Python package that allows you to build fully-functioning computer vision and object detection models with…

github.com

Getting Started

To see how simple it is to get started with Detecto, let’s load in a pre-trained model from torchvision’s model zoo and run inference on the following image:

First, right-click and save the image above to a folder on your computer, and then make sure you’ve downloaded Detecto by running pip3 install detecto. Afterward, run the following script from within the same folder:

The code above reads in the saved image (in my case named “fruit.jpg”), generates predictions on it from a pre-trained model, and plots the results:

Cropped from original image for better visualization

Detecto’s Model class is built on a Faster R-CNN ResNet-50 FPN architecture from torchvision’s models subpackage, which is pre-trained on the COCO 2017 dataset. By default, it can detect about 80 different objects such as fruits, animals, vehicles, kitchen appliances, and more.

Of course, if all you wanted to do is use a default model, there isn’t much need to use a dedicated package. However, if you want to train a model on a custom dataset, that’s where Detecto comes in.

Transfer Learning

There are a couple of tutorials out there that teach you how to use a pre-trained model and apply transfer learning on a custom dataset. However, in many of these scenarios, developers have to define custom classes for their dataset, make modifications to the pre-trained model, or write their own training and visualization methods from scratch. Sometimes, all you want is to quickly whip up some good results. Luckily, doing so with Detecto is easy.

To start off, Detecto comes with a Dataset class (extending that of PyTorch’s) that accepts any data in the PASCAL VOC format; i.e. each image has an associated XML annotation file (here is a great labeling tool for this format). To see what this would look like, you can have your dataset in either of the following formats:

# All images and XML files in the same folder:images/ 
| image0.jpg
| image0.xml
| image1.jpg
| image1.xml
| ...
# Images and XML files in separate folders:images/ 
| image0.jpg
| image1.jpg
| ...labels/
| image0.xml
| image1.xml
| ...

In both cases, reading in your dataset is as simple as the following:

As you can see, you can then index your dataset to get corresponding image-target pairs, which contain information on object labels and locations within each image. This importantly provides a structured data format for training, which can then take as few as four lines of code:

In the above example, after loading our dataset from the “images” folder, we initialize a Model with a list of classes ['alien', 'bat', 'witch'] telling it what we want to predict. Then, we call fit, which will fine-tune the pre-trained model to learn how to detect our custom objects.

Now, let’s run the model on an image and print out the results:

Output:

['alien', 'bat', 'witch']
tensor([[ 569.2125,  203.6702, 1003.4383,  658.1044],
        [ 276.2478,  144.0074,  579.6044,  508.7444],
        [ 277.2929,  162.6719,  627.9399,  511.9841]])
tensor([0.9952, 0.9837, 0.5153])

Here, our top prediction was an alien with coordinates [569, 204, 1003, 658] and a confidence of 99.5%. Let’s also plot our predictions:

Detecto’s visualize module comes with many other visualization methods, including video detection and live camera feed. Here’s what inference on a video looks like:

Once you’re done working, you can save and load your models to a .pth file in typical PyTorch fashion:

Advanced Usage

Detecto is great for quickly creating object detection models, but that doesn’t mean it’s limited in functionality either. An important part of object detection is data augmentation: applying artificial transformations to images in order to increase the diversity of the dataset. Because Detecto sits on top of PyTorch, developers can make use of the torchvision transforms module to augment their datasets:

In this example, we describe a series of transformations to apply to our dataset. As we get ready to train another model, we also define a DataLoader object to customize how the fit method should iterate over our dataset, which we call in the next step:

After passing in the DataLoader, we provide a validation dataset to track performance throughout training, as well as customize a multitude of other parameters. Below is the loss against the validation dataset at each epoch:

All in all, Detecto is still a lightweight library, so after training a model, you may need finetuning capabilities that are not yet supported. Thankfully, you don’t need to limit yourself to Detecto’s API: simply use the get_internal_model method to access the underlying PyTorch model, which you can then integrate into your code as if it were any other PyTorch model.

Conclusion

In this article, I introduce Detecto and show how it can be used to make object detection with PyTorch dramatically easier. To learn more, check out these resources:

Please don’t hesitate to reach out with any questions or submit an issue!