Training an Object Detection Model in a few minutes using Detectron2

Published in

Red Buffer

6 min readMar 29, 2021

Exploring Facebook’s Detectron2 to train an object detection model

Recently, I had to solve an object detection problem. I was looking at different models that I can try including YOLO, SSD, etc. At this point, one of my colleagues recommended that I try out Detectron2. Having closely worked with and having (minutely) contributed to PyTorch, I was excited to find out that Detectron2 is implemented on top of PyTorch.

source: https://github.com/facebookresearch/detectron2

Detectron2 is Facebook AI Research’s next generation software system that implements state-of-the-art object detection algorithms. It is a ground-up rewrite of the previous version, Detectron, and it originates from maskrcnn-benchmark.

I don’t know if it was just my perception or other people have felt the same way that on the first look it seems like Detectron2 is more for segmentation (probably because it started with maskrcnn benchmark and maskrcnn is mainly for object segmentation. Don’t judge if you know better 😛, we are all continuously learning ✌️). Although a quick dive did help me understand that it solves a bunch of different problems which include segmentation and object detection.

Detectron2 includes a variety of models like Faster R-CNN, Mask R-CNN, RetinaNet, DensePose, Cascade R-CNN, Panoptic FPN, and TensorMask. It provides support for many different computer vision tasks including object detection, instance segmentation, human pose prediction, and panoptic segmentation.

The objective of this blog post is to share everything that I did not directly find in the Detectron2 tutorials and things that I had to actively look for before understanding the correct application. Hence, we will cover a complete sequence of steps for registering a dataset, training a model saving the weights, and then loading these weights again for use in the future.

We will be using Roboflow’s open-source dataset for this demonstration. We will be using their aquarium dataset. This dataset consists of about a little over 600 images of 8 different classes of aquatic animals.

source: https://public.roboflow.com/object-detection/aquarium/2/images/14f175066ce74b470bf31fa0c7a096cd

I will be using Colab for this demonstration.

Step 1: Installing the dependencies

!pip install pyyaml==5.1import torch, torchvisionassert torch.__version__.startswith("1.8")!pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.8/index.htm

Cuda and PyTorch come pre-installed in Colab so for our purposes we just need to install PyYAML and Detectron2.

Step 2: Downloading and registering the dataset

If your dataset is in the COCO format, Detectron2 makes it very easy to work with it. If not, you can write a function to parse through your dataset and prepare it into Detectron2’s standard format. Since we are working with a Roboflow dataset, it already allows us to download the dataset in different (predefined) formats and so we can select the COCO JSON format. Here is the code for downloading the dataset. (Please go to this link and select your desired format and it will generate a link for you.)

!curl -L <your-download-link> > roboflow.zip; unzip roboflow.zip; rm roboflow.zip

Once we have our dataset in the proper format, registering it is a one-liner. Additionally, optionally, it is good to register your metadata as well which tells Detectron2 about which class id corresponds to which class which helps in visualization later.

from detectron2.data.datasets import register_coco_instancesregister_coco_instances(“aquarium_train”, {}, “train/_annotations.coco.json”, “train”)register_coco_instances(“aquarium_val”, {}, “valid/_annotations.coco.json”, “valid”)MetadataCatalog.get(“aquarium_train”).thing_classes = [“creatures”, “fish”, “jellyfish”, “penguin”, “puffin”, “shark”, “starfish”, “stingray”]

Step 3: Training

My favorite thing about Detectron2 is that it provides a model zoo where you can select which model to use for pre-training your model before you train it on your dataset. It contains many baseline models for faster R-CNN and RetinaNet and mask R-CNN.

Here is my training configuration. This may not be the ideal configuration and we may be able to modify it and achieve better results but it gives good enough AP scores for this demonstration.

from detectron2.engine import DefaultTrainercfg = get_cfg()cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_X_101_32x8d_FPN_3x.yaml"))cfg.DATASETS.TRAIN = ("aquarium_train",)cfg.DATASETS.TEST = ()cfg.DATALOADER.NUM_WORKERS = 2cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_X_101_32x8d_FPN_3x.yaml")  # Let training initialize from model zoocfg.SOLVER.IMS_PER_BATCH = 2cfg.SOLVER.BASE_LR = 0.00125cfg.SOLVER.MAX_ITER = 1600cfg.SOLVER.STEPS = []        # do not decay learning ratecfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 512cfg.MODEL.ROI_HEADS.NUM_CLASSES = 8# NOTE: this config means the number of classes, but a few popular unofficial tutorials incorrect uses num_classes+1 here. (copied from the official detectron2 tutorial)
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)trainer = DefaultTrainer(cfg)trainer.resume_or_load(resume=False)trainer.train()

Step 4: Inference and Evaluation

You can create a predictor using your defined configurations and use that predictor to infer the output of the model. Here we are randomly visualizing the model results on three validation images.

cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7predictor = DefaultPredictor(cfg)from detectron2.utils.visualizer import ColorModefrom detectron2.data.datasets import load_coco_jsondataset_dicts = load_coco_json("valid/_annotations.coco.json", "valid")for d in random.sample(dataset_dicts, 3):im = cv2.imread(d["file_name"])outputs = predictor(im)v = Visualizer(im[:, :, ::-1],metadata=MetadataCatalog.get('aquarium_train'),scale=0.5,instance_mode=ColorMode.IMAGE_BW)out = v.draw_instance_predictions(outputs["instances"].to("cpu"))cv2_imshow(out.get_image()[:, :, ::-1])

The results look something like this:

Detectron2 has a built-in evaluator for COCO format datasets which we can use for evaluating our model as well. Here is the code which evaluates our trained model, gives an overall Average Precision score as well as a breakdown of the average precision score individually for all the classes.

from detectron2.evaluation import COCOEvaluator, inference_on_datasetfrom detectron2.data import build_detection_test_loaderevaluator = COCOEvaluator("aquarium_val", cfg, False, output_dir="./output/")val_loader = build_detection_test_loader(cfg, "aquarium_val")print(inference_on_dataset(trainer.model, val_loader, evaluator))

Step 5: Saving and reloading the model

When training the model, we specify where to save the model weights.

cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")

We know that PyTorch models require you to define some of the configurations at the time of inference as well so we will need to initialize a configuration variable and give it values in the same manner as we did at the time of training. Here is the code for reloading the saved model using a new configuration.

import torchnew_cfg = get_cfg()new_cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_X_101_32x8d_FPN_3x.yaml"))new_cfg.MODEL.ROI_HEADS.NUM_CLASSES = 8new_cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7new_cfg.MODEL.WEIGHTS = "./output/model_final.pth"

(Optional) Step 6: Saving the model to drive from Colab

I used Colab for training this model. In many cases, we need to use a trained model in the future but as we know, on Colab your session disconnects after a while which results in you losing all the work that you’ve done. The following snippet shows you how to mount your Colab with your drive and save your model on your drive.

from google.colab import drivedrive.mount('/content/gdrive')model_save_name = 'detectron2_aquarium_model.pth'path = F"/content/gdrive/My Drive/{model_save_name}"torch.save(trainer.model.state_dict(), path)

I trained the above model using Colab which can be found here. It contains the code for all the libraries that need to be installed, the complete process, and the method for saving your model to your drive and then loading these weights. Because of the availability of GPU, it really speeds up the process of training and you can see the result with then a few minutes. If you keep your number of iterations to like 300 your model trains in about 10 minutes although I played around with my particular case and kept the number of iterations at 1600 which took a little over half an hour to train.