Guide to fine-tuning a Pre-trained model for Object Detection tasks with Faster RCNN using Detectron2

Siladittya Manna
The Owl
Published in
4 min readSep 1, 2023

In this article, we will be going through the steps needed to fine-tune a pre-trained model for object detection tasks using Faster RCNN as the baseline framework using Detectron2. This article will be a little different from conventional articles, as we won’t be going through millions of lines of code. Instead, we will only be going through the steps to follow.

In this article, we do not assume any specific type pre-trained model. The model can be supervised, semi-supervised, or self-supervised. However, the structure of the model object in the pre-training stage would matter, as the saved weights’ keys would depend on it. For example, in the case of SSL trained using lightly-ai library, the keys in the weights dictionary start with ‘backbone.’, whereas the keys supported by detectron2 have a different naming convention. The dictionary keys in the saved weights can be seen by inspecting the torchvision version of the ResNet50 model given here

Let us proceed step-by-step

First: Installation

For detailed instructions on how to install Detectron2, check

Second: Dataset

Detectron2 has built-in support for a few datasets. The datasets are assumed to exist in a directory specified by the environment variable DETECTRON2_DATASETS. Under this directory, detectron2 will look for datasets in the structure described below, if needed.

$DETECTRON2_DATASETS/
coco/
lvis/
cityscapes/
VOC20{07,12}/

You can set the location for built-in datasets by
export DETECTRON2_DATASETS=/path/to/datasets. If left unset, the default is ./datasets relative to your current working directory.

VOC20{07,12}/
Annotations/
ImageSets/
Main/
trainval.txt
test.txt
# train.txt or val.txt, if you use these splits
JPEGImages/

You may need to set this every time you restart the session. One way is to put it in a .bashrc file.

Third: Convert Pre-trained weights to Detectron2 Format

Detectron2 provides a file to convert model weights saved from torchvision models. The link is provided below.

However, some modifications may be needed depending on the structure of the model object definition in the pre-training stage as stated earlier. Several examples can be seen in the repository for MoCo and DETR. While the basic structure of the code will remain the same, some additional logic needs to be added to handle the varying hierarchical structures of the models.

Fourth: Training

Detectron2 also provides several training scripts in the link given below.

The training can be run by the following example command. Here, we pass the path to the config file, the converted pre-trained weights, and the number of GPUs to be used. The converted pre-trained weights can be passed using the argument MODEL.WEIGHTS, as shown below.

During training, you can modify any hyper-parameter in the configuration file by passing it as an argument or adding it directly to the .yaml file. It is to be noted that the training configurations are set for 8-GPU training. Hence, for training on 1 GPU, we need to set the parameters SOLVER.IMS_PER_BATCH to 2 and SOLVER.BASE_LR to 0.0025.

python train_net.py --config-file configs/pascal_voc_R_50_C4_24k_model.yaml \
--num-gpus 1 MODEL.WEIGHTS pretrained_ckpts/model_epoch199.pkl \
--SOLVER.IMS_PER_BATCH 2 --SOLVER.BASE_LR 0.0025

Note: However, on my desktop, running with a batch size of 2 required about 7 GB of memory. And I found that an LR of 0.01 performed better than 0.0025. You also need to change the SOLVER.MAX_ITER argument if the batch size is decreased. I increased it to 150K from 24K with the LR decayed at steps 100K and 120K by 0.1 multiplicative factor. The hyperparameter values are subject to change depending on your application.

We can also use pytorch-lightning for training the FasterRCNN model using the lightning_train_net.py file in the detectron2/tools directory. When using PyTorch Lightning trainer, refer to this issue for some minor changes in the training script.

You can refer to this article for tips on training FasterRCNN using both PyTorch Lightning and conventional training loops with automatic mixed precision.

Fifth: Evaluation

To evaluate the model, we need to pass the argument given below with the above command. One such example is

python train_net.py --config-file configs/pascal_voc_R_50_C4_24k_model.yaml \
--num-gpus 1 --eval_only MODEL.WEIGHTS /path/to/checkpoint_file

It is necessary to use — — eval-only before MODEL.WEIGHTS.

Inference Demo

For the inference demo, a nice explanation of how to do it step by step is given in the detectron2 repo itself. You can find it in the link given below.

Clap and Share if you like this article. Follow for more.

--

--

The Owl
The Owl

Published in The Owl

The Owl aims to distribute knowledge in the simplest possible way.

Siladittya Manna
Siladittya Manna

Written by Siladittya Manna

Senior Research Fellow @ CVPR Unit, Indian Statistical Institute, Kolkata || Research Interest : Computer Vision, SSL, MIA. || https://sadimanna.github.io