Review — R²MRF: Defocus Blur Detection via Recurrently Refining Multi-Scale Residual Features (Blur Detection)
In this story, R²MRF: Defocus Blur Detection via Recurrently Refining Multi-Scale Residual Features, R²MRF, by China University of Geosciences, National University of Defense Technology, Zhejiang Normal University, Alibaba Group (U.S.) Inc, and University of Sydney, is reviewed. In this paper:
- A novel recurrent residual refinement branch embedded with multiple residual refinement modules (RRMs) is designed.
- The deep features from different layers are aggregated to learn the residual between the intermediate prediction and the ground truth for each recurrent step in each residual refinement branch.
- The side output of each branch is fused to obtain the final blur detection map.
This is a paper in 2020 AAAI. (Sik-Ho Tsang @ Medium)
- R²MRF: Network Architecture
- Residual Refinement Module (RRM)
- Defocus Map Fusing
- Experimental Results
- Ablation Analysis
1. R²MRF: Network Architecture
- Residual Refinement Module (RRM) is designed to learn the residual between the intermediate prediction and the ground truth and construct a recurrent residual refinement branch for each layer by embedding multiple RRMs into it.
- Aggregated Multi-level Deep Features (AMDF) are generated by upsampling the feature maps of the last four layers to the size of the feature maps extracted from the first layer, concatenating them together, and applying a convolution operation to merge these feature maps and reduce the feature dimensions:
- By doing so, multi-level features are integrated to enhance the capability for separating defocus regions from in-focus regions.
Multiple RRMs are embedded into each network branch in a recurrent manner for feature refining and the generated AMDF is used to refine the residual learning process in each RRM.
2. Residual Refinement Module (RRM)
- For each side output branch at the l-th layer, the residual map at the t-th recurrent step (Rlt) can be calculated as:
- Then the output of current recurrent step at the l-th layer can be obtained by adding Rlt with Olt−1 in an element-wise manner, which is computed as:
- In order to further exploit the image features at different scales at a fine-grained level, we construct a dual path network in each RRM and different paths use different convolutional kernel.
- In such a manner, the information between the two paths can be shared with each other so that able to detect the image features at different scales.
3. Defocus Map Fusing
3.1. Final Defocus Map
- The side output results predicted from the 5 different recurrent branches are first concatenated, then a convolution layer with a ReLU activation is imposed on the concatenated maps to obtain the final output defocus blur map:
3.2. Model Training and Testing
- The well trained ResNeXt network on ImageNet is used to initialize the parameters. Therefore, we have five feature extraction layers including conv1, conv2_x, conv3_x, conv4_x, and conv5_x.
- The cross-entropy loss is used for each intermediate output of our network during the training process.
- The final loss function is defined as the loss summation of all intermediate predictions:
- where Lf is the loss of the final fusion layer.
- R²MRF is fine-tuned on part of Shi’s dataset.
- Data augmentation is performed by randomly rotating, resizing and horizontally flipping. The training set is enlarged to 9,664 images.
- The training process is completed after approximately 0.75 hours.
4. Experimental Results
4.1. Quantitative Comparison
- R²MRF consistently performs favourably against other methods on the two datasets, such as DHDE, BTBNet and DeFusionNet (CVPR’19).
- It can be observed that R²MRF also consistently outperforms other counterparts.
4.2. Running Efficiency Comparison
- R²MRF is is faster than all of other methods.
5. Ablation Analysis
5.1. Effectiveness of RRM
- R²MRF_no_RRM: All of the RRMs are removed in R²MRF and the intermediate side outputs are directly refined without residual learning.
- As shown in the above table, R²MRF with RRM module performs significantly better than R²MRF_no_RRM.
- In addition, R²MRF_no_RRM also performs better than other previous methods, which also validates the efficacy of the AMDF for feature refining.
- As shown in the above figure, the residual learning can ease the optimization process with a faster convergence at early stages as well as reduce the training error over directly refining the intermediate side outputs.
5.2. Effectiveness of the Times of Recurrent Steps
- As can be seen, the more times of recurrent step, the better results can be obtained. In addition, it should be noted that R2MRF can obtain relatively stable results after 6 times of recurrent step.
- 6 times of recurrent step are empirically set in the experiments for the tradeoff between effectiveness and efficiency.
5.3. Effectiveness of the Final Side Outputs Fusion
- The final outputs of all the recurrent branches are represented as R²MRF_O1, R²MRF_O2, R²MRF_O3, R²MRF_O4, and R²MRF_O5.
- It can be observed that the fusing mechanism effectively improves the final results.
5.4. Effectiveness of Different Backbone Network Architectures
Blur Detection / Defocus Map Estimation
2017 [Park CVPR’17 / DHCF / DHDE] 2018 [Purohit ICIP’18] [BDNet (JENUCOM’18)] [DBM] [Kim JCGF’18] [BTBNet] 2019 [Khajuria ICIIP’19] [Zeng TIP’19] [PM-Net] [CENet] [DMENet] [DeFusionNet (CVPR’19)] 2020 [BTBCRL (BTBNet + CRLNet)] [DeFusionNET (TPAMI’20)] [BDNet (ACCESS’20)] [MsFEN+MsBEN] [E-Net+B-Net] [BR²Net] [DPN] [R²MRF]