Anchor Boxes in Faster-RCNN

Joydeep Medhi
3 min readSep 25, 2018

--

Please visit my GitHub repo. for more information!

“dat has a better idea neon-light signage” by Franki Chamaki on Unsplash

Introduction

Faster-RCNN is one of the state-of-the-art object detection algorithms around.

If you are not familiar with Faster-RCNN, Please go through this blog.

Here is the link to the original paper Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.

When we train Faster RCNN for custom datasets, we often get confused over how to choose hyper-parameters for the Network. Anchor boxes (one of the hyper-parameters) are very important to detect objects with different scales and aspect ratios. We will get improved detection results if we get the anchors right.

The training & hyper-parameters are in accordance with Tensorflow Object Detection API.

Faster-RCNN config file

faster_rcnn{
# other hyperparameters

first_stage_anchor_generator {
grid_anchor_generator {
height: 256
width: 256
height_stride: 16
width_stride: 16
scales: 0.9
scales: 1.14
scales: 1.53
aspect_ratios: .8
aspect_ratios: 1.15
aspect_ratios: 2.77
}
}
}

height & width

This is the size of base anchor size. (i.e. for scale 1 and aspect ratio 1, the base anchor is 256 x 256)

height_stride & width_stride

This is basically the stride of anchor centres. Generally, we want to visit each point of the feature map (final convolutional layer) and create a set of anchors. Hence, It is the subsampling ratio of the network. In the case of VGG16 this ratio is 16. Different network architectures have different subsampling ratios. The user may select this stride as per the base-model or use case.

scales & aspect_ratios

Aspect Ratio of an anchor box is basically width/height. Scales are bigger as the anchor box is from the base box (i.e. 512 x 512 box is twice as big as 256 x 256).

if aspect_ratio = ar
base_anchor = 256 x 256
"width_b x height_b" is the dimension of an anchor box
width_b = scale * sqrt(ar) * base_anchor[0]
height_b = scale * base_anchor[1] / sqrt(ar)

Analysis of bounding boxes (Training data)

  1. Convert the XML files to a .csv file.

xml_to_csv.py (modify this file as per your XML format)

2. Open EDA_of_bbox.ipynb jupyter notebook for analysis.

Here, we convert the image dimension with _compute_new_static_size() function. Then we normalize the bounding box height and width according to the new image dimension.

Then we find optimal clusters and cluster centres using K-Means. This is inspired by YOLO.

Distribution of Bounding Boxes!

Distribution of bounding boxes (scatter & KD plot)

References

  1. KMeans in YOLO
  2. Cards Dataset (Reference)
  3. Advantage & Disadvantage of KMeans
  4. Different Clustering Algorithms

--

--

Joydeep Medhi

Deep Learning Enthusiast, Mathematics and Computing at IIT Delhi