Review — DPN: Blur Detection via Deep Pyramid Network with Recurrent Distinction Enhanced Modules (Blur Detection)
In this story, Blur Detection via Deep Pyramid Network with Recurrent Distinction Enhanced Modules, DPN, by Shenzhen University, Shenzhen Polytechnic, and Southern Illinois University, is reviewed. In this paper:
- A Deep Pyramid Network (DPN) with recurrent Distinction Enhanced Block modules is designed.
- A new Distinction Enhanced Block (DEB) is introduced to merge the high-level semantic information with the low-level details effectively.
- A new blur detection dataset (SZU-DB) is constructed.
This is a paper in 2020 JNEUCOM. (Sik-Ho Tsang @ Medium)
- DPN: Network Architecture
- Distinction Enhanced Block (DEB)
- Loss Function
- Ablation Study
- Experimental Results
1. DPN: Network Architecture
- The feature maps in lower layers keep the detailed information in the local areas, while the ones in higher layers are used to capture the semantic information of the entire image.
- VGG-16 is used as pre-trained backbone.
- The five layers derived from VGG-16 network are named from Conv1 to Conv5, respectively. Their output feature maps are denoted as F1, F2, . . . , F5.
- The adjacent feature maps in lower layers and higher layers, respectively, Fi and Fi+1, are put into the Distinction Enhancement Block (DEB) module to learn a new distinction-enhanced contextual feature ^Fi.
- The contextual feature map ^Fi at each layer is recurrently refined to generate the final blur detection results.
2. Distinction Enhanced Block (DEB)
- (a): One common way is to use the elementwise addition similar to FPN.
- (b): Another common way is to use the concatenation followed a convolution such as the U-Net.
- (c): A new Distinction Enhancement Block (DEB) is introduced that can merge the high-level semantic information and low-level details effectively, and at the same time is able to enhance the distinction between the clear regions and the blur regions.
- A 3×3 convolution is appended, denoted as F’i.
- The feature maps from the higher layers have been up-sampled by a stride of 2, denoted as F’i+1.
- Then the adjacent feature maps F’i and F’i+1 are concatenated and a 1×1 convolution is used to reduce the feature channels, and become Cat(F’i, F’i+1).
- The feature map F’i is connected to a Distinction Block:
- The output of distinction module is denoted as D(F’i) that is used to capture the distinct information of the feature maps.
- The distinction map is combined with Cat(F’i, F’i+1) by an elementwise addition, and the final feature map ^Fi is obtained, which combines both contextual features and structure features passed by lateral connections.
- The DEB module computes the enhanced feature maps at the layer i:
The purpose of DEB module is to forcibly strengthen the influence of structure features to better handle the details in the predicted outcome, such as the clear boundaries, slim structures, and tiny objects on focus, etc.
3. Loss Function
- The multi-scale loss function is used:
- The feature maps at the finest scale can capture details during reconstructing the blur map, while the feature maps at the coarsest scale provides a mean to embed the high-level semantic information in the reconstruction of the blur map.
- The other feature maps at finer scales are incorporated with the mid-level features so that the mid-level information can be accurately reconstructed later.
- However, the boundaries of blur map are often unclear. A Boundary Refinement module (BR) is introduced before generating the final blur maps.
- An extra boundary penalty loss is added to the loss function:
- where ^Gi and ^Bi denote the gradient magnitude of the ground truth and the predicted blur map at region i, respectively.
- The final loss function is the sum of the above two losses:
4.1. CUHK Dataset
- 1000 images, including 704 out-of-focus images and 296 motion blur images.
- 750 for training, 50 for validation and the rest 200 for testing.
- The input images have been resized to 352×352 for the training.
- To prevent the overfitting, the data augmentation technology is used to get more training data. Rotation(90,180,27,0) and resizing(176×176, 88×88) images are applied.
- Finally, 9000 images are achieved.
4.2. SZU-Blur Detection Dataset (SZU-BD)
- 784 images, including 75 motion blur images and 709 out-of-focus images.
- Different resolutions, ranging from 500×468 pixels to 275×218 pixels.
- Part of images from the dataset come from the salient object detection datasets MSRA10K and DUT.
- There are no duplicated with images of CUHK to testify the generalization abilities.
5. Ablation Study
- Config. I: U-Net, i.e. using concatenation.
- Config. II: FPN, i.e. using element-wise addition.
- Config III: The proposed DEB block.
- Config. IV: The proposed DEB block with boundary penalty term.
- The compared results show that the ‘‘Config. II” has improved the performance over the ‘‘Config. I” by a large margin. This improvement seems mainly due to the benefits from the multi-scale supervisions used in the FPN network.
- ‘‘Config. III” has a better performance than ‘‘Config. II”, that comes from the DEB block.
- Furthermore, ‘‘Config. IV” has a small improvement over ‘‘Config. III” because the new additional boundary penalty term forces the results having a more clear boundary.
- The PR curves show that the final model has achieved the high precision and high recall at the same time.
- As shown above, the results of ‘‘Config. III” and ‘‘Config. IV” are much closer to the ground truth.
- Different from the results of ‘‘Config. III”, ‘‘Config. IV” as the final model appears to have more clear boundaries.
6. Experimental Results
6.1. Quantitative Comparisons
- The proposed method, DPN, performs the best in terms of MAE, Max(Fm), AP, WF, AUC and Sm metrics.
- It also performs the second best in terms of AR.
- PR curve of the proposed method is higher than the others, such as Park CVPR’17 / DHCF / DHDE, EHS (DBM) & LM (Zeng TIP’19).
- Obviously, F-measure curve of the proposed method is higher than the others by a large margin. It also remains higher over a wider range which demonstrates it is insensitive to the selection of threshold.
6.2. Cross-Dataset Evaluation on SZU-BD
- As Park CVPR’17 / DHCF / DHDE, EHS (DBM) did not mention about which images were used for training. Therefore, in order to avoid inaccurate performance evaluations caused by overfitting, we need to compare the performances of different methods on a new dataset, SZU-BD.
- As shown, the deep learning based methods perform better than the handcrafted feature based methods.
- Among them, the proposed model achieve the best performance, over Park CVPR’17 / DHCF / DHDE, and EHS (DBM).
- In summary, the proposed model presents better generalization ability compared with those used in existing state-of-the-art methods.
6.3. Visual Comparison
- The fast moving object is blur, just as examples in the 5th row.
- Many other methods make mistakes in motion blur detection due to the incorrect assumption that moving objects are clear, and the background is blurred.
- In contrast, the proposed approach is able to handle correctly.
- In some cases, their interesting things are rather clear regions than objects, such as in the 1st and 2nd rows.
- As seen, the proposed method also works well.
- Compared with the other methods, the proposed method’s results have more compact objects and clear boundaries.
- For objects with prominent colors but blurry details in the background, the proposed method can still correctly identify and segment, while the handcrafted features based methods often fail to work on this scene, such as rows 1, 7 and 10.
- Especially, for the multiple objects cases, the proposed method is able to detect all objects without missing, and segments them accurately, as shown the first, the third, and the last two rows.
6.4. Running Time Comparison
- 100 images on CUHK are tested.
- The performance of our method has achieved a well balance between prediction performance and running speed.
6.5. Failure Cases
- For example, as shown above, if there are reflective regions inside the object, it cannot completely distinguish the blur regions and the reflective regions.
- A potential solution is to introduce an illumination invariant prior into our network, or to collect a larger and more complex dataset.
6.6. Potential Applications
6.6.1. Camera Shake Detection
- The camera shakes often occur, which generally leads to an overall blurriness in the images. The proposed model is very sensitive to the blurriness in the images.
- The blur degree S of an image is defined as the averaged value of the corresponding blur map.
- (a): It is a blur image caused by the camera shake, and its blur degree S=0.9972.
- (c): It is a clear image and its blur degree S = 0.0006.
- (b): In particular, for the images with depth of field, we can also measure its blur degree, its blur degree S = 0.5817.
6.6.2. Depth of Field Magnification
- Firstly, the proposed model estimates the blur map of the images with depth of field. Then the objects from focus are extracted and put a larger Gaussian blur on the background.
- Clearly, after our depth of field magnifications, the objects on focus are more prominent and appear to have a natural looking without noticeable artifacts.
Blur Detection / Defocus Map Estimation
2017 [Park CVPR’17 / DHCF / DHDE] 2018 [Purohit ICIP’18] [BDNet (JENUCOM’18)] [DBM] [Kim JCGF’18] [BTBNet] 2019 [Khajuria ICIIP’19] [Zeng TIP’19] [PM-Net] [CENet] [DMENet] [DeFusionNet (CVPR’19)] 2020 [BTBCRL (BTBNet + CRLNet)] [DeFusionNET (TPAMI’20)] [BDNet (ACCESS’20)] [MsFEN+MsBEN] [E-Net+B-Net] [BR²Net] [DPN]