CRAFT (Object detection)

Tanmay Thaker
Nerd For Tech
Published in
2 min readAug 10, 2021

CRAFT stands for Cascade Region-proposal-network And FasT R-CNN. It is reviewed by the Chinese Academy of Sciences and Tsinghua University. In Faster R-CNN, a region proposal network is used to generate proposals. These proposals, after ROI pooling, are going through the network for classification.

However, CRAFT is found that there is a core problem in Faster R-CNN:

• In proposal generation, there is still a large proportion of background regions. The existence of many background samples causes many false positives.

In CRAFT(Cascade Region-proposal-network), as shown above, another CNN(Convolutional neural network) is added after RPN to generate fewer proposals (i.e., 300 here). Then, classification is performed on 300 proposals and outputs about 20 first detection results. For each primitive result, refined object detection is performed using one-vs-rest classification. Cascade Proposal Generation Baseline RPN

  • An ideal proposal generator should generate as few proposals as possible while covering almost all object instances. Due to resolution loss caused by CNN pooling operation and the fixed aspect ratio of the sliding window, RPN is weak at covering objects with extreme shapes or scales.
  • The above results are baseline RPN based on VGG_M trained using PASCAL VOC 2007 train+val and tested on the test set.
  • The recall rate on each object category varies a lot. Objects with extreme aspect ratios and scales are hard to be detected, such as boats and bottles.

Proposed Cascade Structure

  • An additional classification network comes after RPN.
  • The additional network is the 2- class detection network denoted as FRCN net in the above figure. It uses the output of RPN as training data.
  • After the RPN net is trained, the 2000 first proposals of each training image are used as training data for the FRCN net.
  • During training, +ve and -ve sampling are based on 0.7 IoU for negatives and below 0.3 IoU for negatives, respectively.

There are 2 advantages:

1) First, additional FRCN net further improves the quality of the object proposals and shrinks more background regions, making proposals fit better with task requirements.

2) Second, proposals from multiple sources can be merged as the input of the FRCN net so that complementary information can be used.

--

--

Tanmay Thaker
Nerd For Tech

Software Engineer (Machine Learning) | Passionate about Machine Learning and Artificial Intelligence