Detectron2 config, optimizer, lr_scheduler (Part 1)

Published in

Innovation-res

6 min readNov 12, 2021

source: https://github.com/facebookresearch/detectron2

This post is dedicated to give some practical information regarding the configurations for the Mask RCNN model provided by Detectron.2 API framework!

If you work on a project using Detectron2 or you already tried the official tutorial and you are looking for the something extra, this post will help you understand some slightly more advanced configurations of the Detectron2 API.

All references and sources are noted as hyperlinks.

The needed code of this post can be found at this GitHub gist!

Introduction

Detectron2 is a library by Facebook AI Research that provides state-of-the-art detection and segmentation algorithms. It is released under the Apache 2.0 license, which means that anyone have permission for commercial use, modification, distribution, patent use and private use.

GitHub - facebookresearch/detectron2: Detectron2 is FAIR's next-generation platform for object…

Detectron2 is Facebook AI Research's next generation library that provides state-of-the-art detection and segmentation…

github.com

In this post we will show some slightly advanced settings for the Mask RCNN model of Detectron2.

Some notes for the pre-requests:

The official installation guide is well defined here.
The official tutorial to getting started and see a first example can be found here and some projects can be found here.
The official extensive documentation.

Configuration file

The official documentation has extensive references list for the config here.

detectron2.config - detectron2 0.6 documentation

Edit description

detectron2.readthedocs.io

Let’s see an example of how to change the yaml config file.

You can check which model you need from the provided Model Zoo. For example to use a ResNeXt-101 + Feature Pyramid Network (FPN) backbone for the Mask RCNN.

Note, below first the code is presented and then explained in a few words.

Model

from detectron2.config import get_cfgbase_model = "COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x.yaml"cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file(base_model))
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(base_model)

‘cfg.merge_from_file’ gets the configs from the chosen base model
‘cfg.MODEL.WEIGHTS’ gets the pre-trained weights (here it uses .get_checkpoint_url by passing the yaml, but also could be a pkl file or a pth file without using the .get_checkpoint_url of course)

Note: at this point the default values from base_model of the config file are loaded. You can check these values by: print(cfg).

cfg.MODEL.BACKBONE.FREEZE_AT = 2
cfg.MODEL.ANCHOR_GENERATOR.SIZES = [[32, 64, 128, 256]]
cfg.MODEL.RPN.NMS_THRESH = 0.7
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 256   cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1

‘cfg.MODEL.BACKBONE.FREEZE_AT’ freeze the first 2 stages so they are not trained. If you want to train all layers then set it to zero.
‘cfg.MODEL.ANCHOR_GENERATOR.SIZES’ the anchor sizes (i.e. sqrt of area) in absolute pixels w.r.t. the network input
‘cfg.MODEL.RPN.NMS_THRESH’ the Non-maximum Suppression threshold used on Region Proposal Network (RPN) proposals.
‘cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE’ the Number of regions per image used to train RPN.
‘cfg.MODEL.ROI_HEADS.NUM_CLASSES’ the number of foreground classes (Note: you should not include the background as a class!).

cfg.MODEL.PIXEL_MEAN = [123.675, 116.28, 103.53]
cfg.MODEL.PIXEL_STD = [58.395, 57.12, 57.375]

in Yaml Config References line 33, states that the order of pixel-mean-channels and pixel-std-channels for image normalization must be consistent with the input channel format, i.e. the INPUT.FORMAT with the MODEL.PIXEL_MEAN and MODEL.PIXEL_STD.
Here we will use RGB thus we reordered the channel of yaml’s default normalization stats, because yaml’s default order is BGR.

Note: in some cases it is better to compute the normalization stats from your specific dataset than using the given stats from ImageNet or from COCO (if you use transfer learning). Advise: if you have the time compute the normalization stats and compare the metric to the given stats ImageNet’s, it worth the effort (for more info see this interesting discussion in Pytorch).

You can use the code from this gist to find the mean and std for your own dataset (you might need to modify the Dataset class to fit your case).

2. Input

cfg.INPUT.RANDOM_FLIP = "none"
cfg.INPUT.FORMAT = "RGB" 
cfg.INPUT.MIN_SIZE_TRAIN = (256, ) 
cfg.INPUT.MAX_SIZE_TRAIN = 256
cfg.INPUT.MIN_SIZE_TEST = 0

‘cfg.INPUT.RANDOM_FLIP’ mode for flipping images used in data augmentation during training, choose “horizontal” or “vertical”, otherwise choose “none” and create your own augmentations (we will see how in a next post).
‘cfg.INPUT.FORMAT’ whether the model needs RGB, YUV, HSV etc. (i.e. one of the PIL modes). Here channels’ order is RGB
‘cfg.INPUT.MIN_SIZE_TRAIN’ size of the smallest side of the image during training (Note: if you do not change the cfg.INPUT.MIN_SIZE_TRAIN and your image height or width is less than 800 pixels then by default it will be resized to 800)
‘cfg.INPUT.MAX_SIZE_TRAIN’ maximum size of the side of the image during training
‘cfg.INPUT.MIN_SIZE_TEST’ Size of the smallest side of the image during testing. Set to zero to disable resize in testing.

3. Solver

cfg.SOLVER.IMS_PER_BATCH = 4  
cfg.SOLVER.BASE_LR = 3e-4 
cfg.SOLVER.CLIP_GRADIENTS.ENABLED = True
cfg.SOLVER.CLIP_GRADIENTS.CLIP_TYPE = "value"
cfg.SOLVER.CLIP_GRADIENTS.CLIP_VALUE = 1.0
cfg.SOLVER.AMP.ENABLED = True

‘cfg.SOLVER.IMS_PER_BATCH’ the batch size
‘cfg.SOLVER.BASE_LR’ the learning rate
‘cfg.SOLVER.CLIP_GRADIENTS.ENABLED’ enable gradient clipping
‘cfg.SOLVER.CLIP_GRADIENTS.CLIP_TYPE’ type of gradient clipping, “value”: the absolute values of elements of each gradients are clipped
‘cfg.SOLVER.CLIP_GRADIENTS.CLIP_VALUE’ maximum absolute value used for clipping gradients

Note: by default the SGD optimizer is used, bellow we will see how to use any other optimizer and how to use the warm up and lr scheduler.

‘cfg.SOLVER.AMP.ENABLED’ enable Automatic Mixed Precision (AMP) for training (it is implemented in AMPTrainer, and can be used by enabling it in config as shown above and use the DefaultTrainer). To use AMP in inference, run inference under autocast().

4. Save the config in yaml file

import os, json, datetimenow = datetime.datetime.now()
cfg.OUTPUT_DIR = f'./logs/{now.strftime("%Y%m%d_%H%M%S")}'os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)

Make a directory to save the weights and metrics.

import yaml 
# Dump the config file in the output directory
with open(cfg.OUTPUT_DIR+'/config.yaml', 'w') as config:
    yaml.dump(cfg, config)
    config.close()

Save the yaml in the output directory. You can now also use your config file for future trainings! How? It’s as simple as this:

cfg_load = get_cfg()
cfg_load.merge_from_file(cfg.OUTPUT_DIR+'/config.yaml')

Once you have created the config file, you can load it. This can be use for inference afterwards.

Optimizer

To add any optimizer you want, you need to modify the Trainer class. We inherit from the DefaultTrainer and change the build_optimizer method. In our example below we use the AdamW optimizer from Pytorch.

import torch
from detectron2.solver.build import get_default_optimizer_params
from detectron2.solver.build import maybe_add_gradient_clippingclass MyTrainer(DefaultTrainer):
    @classmethod
    def build_optimizer(cls, cfg, model):
        """
        Build an optimizer from config.
        """
        params = get_default_optimizer_params(model)
        return maybe_add_gradient_clipping(cfg, torch.optim.AdamW)(
                          params, 
                          lr=cfg.SOLVER.BASE_LR,        
                          weight_decay=cfg.SOLVER.WEIGHT_DECAY)

Code modified from Panoptic-DeepLab, from this script.

Learning Rate Scheduler

One can add learning rate scheduler by modifying the build_optimizer method of the DefaultTrainer class. Another way to add learning rate scheduler is by modifying the related config arguments.

For example, to use something similar to the one cycle policy with linear warm up and cosine annealing, you could follow the following config arguments:

cfg.SOLVER.IMS_PER_BATCH = 4  
cfg.SOLVER.MAX_ITER = 4000
cfg.SOLVER.BASE_LR = 8e-4
cfg.SOLVER.LR_SCHEDULER_NAME = "WarmupCosineLR"
cfg.SOLVER.WARMUP_ITERS = int(0.2*cfg.SOLVER.MAX_ITER)

‘cfg.SOLVER.BASE_LR’ the max learning rate that the optimizer will get
‘cfg.SOLVER.LR_SCHEDULER_NAME’ options: WarmupMultiStepLR, WarmupCosineLR.(see here)
‘cfg.SOLVER.WARMUP_ITERS’ the iterations of warm up, here we warm up for the 20% of the total iterations

To sum up, we saw some how to change some of the arguments of config and dug into the arguments list of the config. Additionally, we show how to use any optimizer and how to change the learning rate scheduler.

Please let me know if you have any thoughts comments or suggestions!

Keep Learning!! 😁

The used code: