ROI-pooling and ROI-align

2 min readMar 30, 2017

Fast-rcnn:https://arxiv.org/pdf/1504.08083.pdf

Mask-rcnn: https://arxiv.org/pdf/1703.06870.pdf

ROI-pooling:

Insights:

For speed up training/testing speed, Fast-rcnn applied the ROI pooling method to allow only one forward/backward pass for multiple ROIs in one input image.

Each bounding box from the input image is projected to an area (ROI) on the last convolution feature maps. The challenge is that each ROI has different size; represent by four-tuple:

— {top-left corner, height, width} = (r, c, h, w).

The ROI pooling layer divide the h x w ROI as H x W sub-windows; and then max-pool each sub-window to get H x W map to represent the ROI. (The max-pool kernel is [h/H], [w/W] respectively). ROI-pooling allow Fast-rcnn to forward the image just once for different-scale ROIs in the input images.

Downside:

In the engineering point-of-view, dividing a large resolution (h x w) feature map to a rather smaller feature map by quantization will create misaligned result on the boundaries (because of the rounding operation).

2. ROI-align:

In each ROI bin, the value of the four regularly sampled locations are computed directly through bilinear interpolation. Thus avoid the misaligned problem.

The ROI align is reported to have ~3 points improvement in AP in trainval35k.

ROI-pooling and ROI-align

Written by Sheng Hu