Review: U-Net+ResNet — The Importance of Long & Short Skip Connections (Biomedical Image Segmentation)

A Very Deep Fully Convolutional Network (FCN), With Both Long & Short Skip Connections

This time, a Fully Convolutional Network (FCN), with both long and short skip connections, for biomedical image segmentation, is reviewed.

Last time, I’ve reviewed RoR (ResNet of ResNet, Residual Networks of Residual Networks) (It is a 2018 TCSVT paper, if interested, please visit my review.) In RoR, by using long and short skip connections, the image classification accuracy is improved. And the effectiveness of using long and short skip connections has been proved by the experimental results.

This time, rather than just showing the experimental results, authors also provide a way to show its effectiveness by analyzing the weights within the network.

Thus, despite the purpose of this work is to have biomedical image segmentation, by observing the weights within the network, we can have a better understanding of the long and short skip connections. And it is published in 2016 DLMIA (Deep Learning in Medical Image Analysis) with over 100 citations. (Sik-Ho Tsang @ Medium)

Electron Microscopy (EM) Image Segmentation

Outline

  1. Skip Connection in ResNet
  2. Long and Short Skip Connections
  3. Loss Functions
  4. Results
  5. Weight Analysis

1. Skip Connection in ResNet

ResNet Building Block
  • In ResNet, consecutive ResNet building blocks are used.
  • Only short skip connections are used. And there are no long skip connections.

2. Long and Short Skip Connections

(a) Residual Network with Long Skip Connections, (b) Bottleneck Block, (c) Basic Block, (d) Simple Block. (Blue: Optional Downsampling, Yellow: Optional Upsampling)

(a) Residual Network with Long Skip Connections

  • With downsampling (blue): It’s a contracting path.
  • With upsampling (yellow): It’s an expanding path.
  • This is a U-Net-like FCN architecture.
  • And there are long skip connections from contracting path to expanding path.

(b) Bottleneck Block

  • 1×1Conv-3×3Conv-1×1Conv are used, therefore it is called a bottleneck. It is already used in ResNet.
  • BN-ReLU are used before each Conv, this is the idea from Pre-Activation ResNet.

(c) Basic Block

  • Two 3×3Conv, it is also used in ResNet.

(d) Simple Block

  • One 3×3Conv.

(b)-(d)

  • All blocks contain short skip connections.
A Detailed Model Architecture

3. Loss Functions

Two loss functions are considered.

3.1. Loss Function Using Binary Cross-Entropy

  • A standard cross-entropy loss.

3.2. Dice Loss

  • Dice loss is another common loss for biomedical image segmentation.

4. Results

4.1. Dataset

  • Training Set: 30 Electron Microscopy (EM) Images with size 512×512. 25 images for training, leave out 5 images for validation.
  • Test Set: Another 30 images.
  • Full resolution is used as input to the network.
  • No post-processing steps.

4.2. Long and Short Skip Connections

Loss/Accuracy against epoches: (a) Long and Short, (b) Only Short, (c) Only Long
Best Losses
  • As observed, among the above 3 settings, using both long and short skip connections can obtain the smallest loss or the highest accuracy.

4.3. Comparison with state-of-the-art Approaches

ISBI EM Segmentation Challenge (http://brainiac2.mit.edu/isbi_challenge/)
  • In the ISBI EM Segmentation Challenge, Vrand and Vinfo are used for ranking evaluation.
  • Foreground-restricted Rand Scoring Vrand: It is a weighted harmonic mean of Rand split score and Rand merge score. The split and merge scores can be interpreted as precision and recall in the classification of pixel pairs as belonging to the same segment (positive class) or different segments (negative class).
  • Information Theoretic Scoring Vinfo: It is a weighted harmonic mean of information theoretic split score and information theoretic merge score. It is a measure of mutual information (MI), which acts as an alternative to Rand scoring.
  • The details of two metrics: 
    https://www.frontiersin.org/articles/10.3389/fnana.2015.00142/full
  • The proposed approaches (bottom of the table) are comparable to CUMedVision and U-Net. Though it is a bit inferior, the proposed approaches do not use any post-processing steps, which is an end-to-end learning solution.

5. Weight Analysis

(a) Long & short skip connections, (b) Only long skip connections with 9 repetitions of simple block, (c) Only long skip connections with 3 repetitions of simple block, (d) Only long skip connections with 7 repetitions of simple block without BN.
  • Blue: Small weight values.
  • Red: Large weight values.

(a) Long & short skip connections

  • Parameter updates appear to be well distributed when both long and short skip connections are present.

(b) Only long skip connections with 9 repetitions of simple block

  • When short skip connections are removed, the deep parts of the network get few updates.
  • When long skip connections are retained, at least the shallow parts of the model can be updated.

(c) Only long skip connections with 3 repetitions of simple block

  • When the models that are shallow enough for all layers to be well updated.

(d) Only long skip connections with 7 repetitions of simple block without BN.

  • Networks without batch normalization had diminishing updates toward the center of the network.

In conclude about the weight analysis, layers closer to the center of the model cannot be effectively updated due to the vanishing gradient problem which is alleviated by short skip connections.