Review — DBM: Deep Blur Mapping (Blur Detection)

FCN-Like Network is Used, Outperforms Traditional Approaches

Sik-Ho Tsang

Follow

Published in

The Startup

6 min readJan 1, 2021

--

Happy New Year 2021!

**Challenges in local blur mapping, many smooth regions cannot be distinguished as smooth textures or blurred regions**

In this story, Deep Blur Mapping: Exploiting High-Level Semantics by Deep Neural Networks, DBM, by University of Waterloo, and The University of Sydney, is reviewed. In this paper:

It is found that high-level semantic information is critical in successfully identifying the local blur. Thus, authors proposed DBM which is a deep network.
And it is the first end-to-end local blur mapping algorithm based on a fully convolutional network.

This is a paper in 2018 TIP with over 10 citations where TIP has a high impact factor of 9.34. (Sik-Ho Tsang @ Medium)

Outline

DBM: Deep Blur Mapping
Experimental Results

1. DBM: Deep Blur Mapping

Several linearly cascaded FCNs based on the 16-layer VGGNet architecture are tried.
VGGNet is trimmed up to the last convolutional layer in each stage, i.e., conv1_2, conv2_2, conv3_3, conv4_3, and conv5_3, respectively, resulting in five FCNs with different depths.
For each network, a convolutional layer with a 1×1 receptive field is added, which performs a linear summation of the input channels. Finally, an in-network upsampling followed by sigmoid nonlinearity is implemented.
All fully connected layers are removed, which significantly reduces the computational complexity. This speeds up the computation and reduces the memory storage at training and testing time.

Configuration I retains the spatial information intact, which is ideal for dense prediction. However, due to its shallow structure, it can only extract low-level information and fail to learn powerful semantic information from the image.
On the contrary, Configuration V has a very deep structure, which consists of 14 stacks of convolutional filters. Therefore, it is capable of transforming low-level features into high-level semantic information, but sacrifices fine spatial information due to max-pooling.

**a) Test image (b) Configuration I. (c) Configuration II. (d) Configuration III. (e) Configuration IV. (f) Configuration V. (g) Ground truth**

Optimal dataset scale (ODS) F-score is obtained by finding an optimal threshold for all images in the dataset.
Optimal image scale (OIS) F-score is obtained by averaging the best F-scores for all images.
Average precision (AP) is obtained by averaging over all recall levels.
The weights are initialized using FCN one.
500 odd-number images in Shi’s Dataset are used for training, 500 even-number images are used for testing.
Configuration V is found to have highest ODS, OIS and AP.

Thus, it can be concluded semantic information plays a dominant role in local blur mapping. By contrast, spatial information is less relevant.

And it is used as the default architecture for SOTA comparison.
The network architecture of Configuration V is as follows:

**The network architecture of Configuration V**

The network is similar to the one in FCN.
(However, here, test set is used for finding the best configuration. Authors did not setup a validation set for ablation study.)

2. Experimental Results

2.1. SOTA Comparisons

DBM achieves the highest precisions for all the recall levels, where the improvement can be as large as 0.2.
Because traditional methods tend to give flat regions high blur confidence and misclassify them into blurry regions.
Moreover, DBM exhibits a less steep decline at the middle recall range [0.2, 0.8].

DBM significantly advances the state-of-the-art by a large margin with an ODS F-score of 0.853.

**(a) Test images. (b) Su11 [32]. (c) Shi14 [11]. (d) DBM. (e) Ground truths.**

DBM is able to robustly detect local blur from complex foreground and background.
First, it well handles blur regions across different scales from the small motorcycle man (in the 3-th row) to the big girl (in the 6-th row).
Second, it is capable of identifying flat regions such as the car body (in the first row), clothes (in the 2-nd and 6-th rows), and the road sign (in the 4-th row) as non-blurry.
Third, it is barely affected by strong structures after blurring and labels those regions correctly.

2.2. Importance of High-Level Semantics

**(a) Test image. (b) Training from scratch. (c) Training with** **FCNs and skip layers [18] (FCN-8s). (d) Training with weighted fusion only [45]. (e) Training with weighted fusion and deep supervision [45] (DSN). (f) DBM. (g) Ground truth.**

DBM trained from scratch without using semantically meaningful initializations. The results shown in the first row, are unsatisfactory.
While making better use of low-level features at shallower layers, including FCNs with skip layers (FCN-8s), weighted fusion of side outputs, and weighted fusion of side outputs and deep supervision (DSN), they void the benefits of high-level features and results in erroneous and non-uniform blur assignments, as shown in the above figure.

DBM solely interpolates from high-level feature maps achieves comparable performance to its most sophisticated variant DSN and ranks the best in terms of AP. These manifest the central role of high-level semantics in local blur mapping.

2.3. Independence of Training Data

When training set and test set is swapped, similar results are obtained.

2.4. More Training Data

By randomly sampling 400 images from even-number images, incorporate them into odd-number images, and test DBM on the remaining 100 images, the performance improves from ODS = 0.862 to ODS = 0.869.

2.5. Running Time

10 images of size 384×384×3 on a computer with 3.4GHz CPU and 16G RAM are tested.

From the above table, we can see that DBM keeps a good balance between prediction performance and computational complexity.

When the GPU mode is activated (we adopt an NVIDIA GTX Titan X GPU), DBM runs significantly faster.

2.6. Other Applications

**The blur region segmentation results. (a) Shi14 [11]. (b) DBM**

The blur map produced by DBM provides a useful mask to initialize segmentation without human intervention.
GrabCut in OpenCV is used to segment, DBM does a better job in segmenting images into blur and clear regions.

A straightforward blur degree S of an image is the average value of the corresponding blur map.
The above figure shows a set of dog pictures ranked from left to right with increasing S, from which we can see that DBM robustly extracts blurred regions with high confidence and that the ranking results are in close agreement with human vision of blur perception.

**(a) Test image (b) Ground truth blur map. (c) Magnification by Shi14 [11]. (d) Blur map by Shi14 [11]. (e) Magnification by DBM. (f) Blur map by DBM.**

A shallow depth-of-field is often preferred in creative photography, such as portraits. With extracted blurred regions, it is easy to drive a computational photography approach to increase defocus for blur magnification.
It is clear that DBM is barely affected by the structures with blurring and delivers a perceptually more consistent result with smooth transitions from clear to blur regions.

2.7. Failure Case

It is difficult to extract accurate and useful semantic information for local blur mapping. A potential solution is to retrain DBM on a larger database of more scene structure variations.
Also, DBM generates blur maps with coarse boundaries. Boundary refinement is one of the authors’ future works.

Reference

[2018 TIP] [DBM]
Deep Blur Mapping: Exploiting High-Level Semantics by Deep Neural Networks

Blur Detection / Defocus Map Estimation

2017 [Park CVPR’17 / DHCF / DHDE] 2018 [Purohit ICIP’18] [BDNet] [DBM] [BTBNet] 2020 [BTBCRL (BTBNet + CRLNet)]