Understanding Faster R-CNN Configuration Parameters

Lavanya P
4 min readSep 19, 2020

--

When I first started creating the Faster R-CNN model using TensorFlow Object Detection API, I couldn’t find the parameter definition in one place, I used to surf on web for each parameter on stack overflow and blogs. I’ve collated all those findings in this post, so that it would be easy to get a quick idea on how and what these parameters mean. It is difficult to tune the model without knowing the usage of it.

TensorFlow Object Detection API comes with default config file(models/research/object_detection/samples/configs/) which is trained on various datasets. For tuning the model, we need to understand what each parameter mean in the config file.

Below are the details of Faster R-CNN model parameters in the TensorFlow Object Detection API config file.

num_classes : Number of objects to be detected

image_resizer: Image dimensional is important for object detection. There are two types — fixed_shape_resizer and keep_aspect_ratio_resizer

fixed_shape_resize: This will pad the minor dimension instead of skewing or warping, which greatly improves stability in the face of natural web images.

keep_aspect_ratio_resizer: Resizes the input image keeping aspect ratio to satisfy the minimum and maximum size constraints.

feature_extractor: A FasterRCNNFeatureExtractor object. Feature extractors (VGG16, ResNet, Inception, MobileNet)

transfer_learning_dataset: Name of dataset used for transfer learning ( Kitti/Coco)

type: Name of the feature extractor — resnet50, resnet101, inception_resnet_v2_atrous, etc

first_stage_features_stride: Describes the output stride — 16/8

first_stage_anchor_generator: An anchor_generator.AnchorGenerator object. scales depends on object size need to increase or decrease the scales

scales: Depending on object size increase or decrease the scales, for very small objects try reducing the scales.

aspect_ratios: Anchor boxes defined to capture the scale and aspect ratio of specific object classes you want to detect and are typically chosen based on object sizes in the training data set.

height_stride: Stride this is actually a nominal value . We get feature map in a CNN after doing several convolution , max-pooling operations . Let’s say our input image is 224 * 224 and our final feature map is 7*7 .
Then we say our stride is : 224/7 = 32 (Approximate of what happened to the image after down sampling .)

width_stride: Stride this is actually a nominal value . We get feature map in a CNN after doing several convolution , max-pooling operations . Let’s say our input image is 224 * 224 and our final feature map is 7*7 .
Then we say our stride is : 224/7 = 32 (Approximate of what happened to the image after down sampling )

first_stage_nms_score_threshold : Score threshold for non max suppression for the Region Proposal Network (RPN). This value is expected to be in [0, 1] as it is applied directly after a softmax transformation. The recommended value for Faster R-CNN is 0

first_stage_nms_iou_threshold : The Intersection Over Union (IOU) threshold for performing Non-Max Suppression (NMS) on the boxes predicted by the Region Proposal Network (RPN).

first_stage_max_proposals: Faster R-CNN has two networks, the first proposes regions where objects may be found and the second tries to detect objects in those. Increasing the number of proposals by the first network increases the accuracy but implies more computational work, because the second network has to search in more potential areas.

initial_crop_size: A single integer indicating the output size (width and height are set to be the same) of the initial bi-linear interpolation based cropping during ROI pooling.

maxpool_kernel_size: A single integer indicating the kernel size of the max pool op on the cropped feature map during ROI pooling.

maxpool_stride: A single integer indicating the stride of the max pool op on the cropped feature map during ROI pooling.

use_dropout: Dropout is an regularization technique used to avoid the risk of over fitting.

dropout_keep_probability: Value for dropout in a hidden layer is between 0 and 1

batch_size : It defines the number of work elements in your batch. TensorFlow requires a fixed number and doesn’t take into consideration GPU memory or data size. This number is highly dependent on your GPU hardware and image dimensions, and isn’t strictly necessary for quality results. (It has been observed in practice that when using a larger batch there is a significant degradation in the quality of the model, as measured by its ability to generalize)

optimizer: It defines how your weights get updated by back propagation. schedule Learning Rate scheduler — Manual/Cosine/Exponential Learning rate

gradient_clipping_by_norm: It is necessary to avoid exploding gradients. We set the value of 10 through experimentation but it can be adjusted.

num_steps :The number of steps is not epoch.The number of training steps depends on the learning rate setting and the batch size.Number of steps are usually calculated as (Train data size /Batch Size)*epoch

from_detection_checkpoint: True — not described in any of the documentation, but required for your pre-trained object detection checkpoint to work correctly. If you use a pure “classification” checkpoint, leave this as false.

data_augmentation_options : Setting some augmentation options can dramatically increase our dataset’s size, while improving the robustness of our detector. we can also use auto augment option, which has set of strong data augmentations (autoaugment link in TF OD API)

Reference Links:

  1. https://www.tensorflow.org/api_docs/python/tf/image/ResizeMethod
  2. https://github.com/tensorflow/models/blob/master/research/object_detection/protos/optimizer.proto
  3. https://github.com/tensorflow/models/blob/master/research/object_detection/core/preprocessor.py

--

--