Review: CUMedVision1 — Fully Convolutional Network (Biomedical Image Segmentation)

In this story, CUMedVision1, by CUHK, is reviewed. By using fully convolutional network (FCN), CUMedvision1 outperforms the state-of-the-art approaches by a large margin on the benchmark dataset of 2012 ISBI segmentation challenge.

The leaderboard:

In this challenge, serial section Transmission Electron Microscopy (ssTEM) images are segmented. An example is shown below:

Input Image (Left), Segmentation Results (Right) (Individual components are denoted by different colors.)

For the image above, we need experts for labeling (or annotation). This process is time-consuming and expensive. If we can have automatic labeling by segmentation, we can save the time and cost.

You may ask: “Is it too narrow to read about biomedical Image Segmentation? I am not working in this field, Is it not so useful for me?” However, we may learn the techniques of it, and apply it to different industries. Say for example, quality control / automatic inspection / automatic robotics during construction / fabrication / manufacturing process.

CUHK is very active in the field of deep learning. And this work has been published in 2016 AAAI with more than 60 citations, which is high considering for this field, while I was writing this story. (Sik-Ho Tsang @ Medium)

What Are Covered

  1. FCN Architecture
  2. Loss Function
  3. Boundary Refinement
  4. Results

1. FCN Architecture

CUMedVision1 FCN Architecture
  1. As in the figure above, first, we have an input image from the left.
  2. Then the input image goes through the down-sampling path with convolutional and max pooling layers. This path aims at classifying the semantical meaning based on the high level abstract information.
  3. At certain layers before pooling, the feature maps will go through unsampling path with convolutional and deconvolutional layers. This path is to reconstruct the fine details such as boundaries. Backwards strided convolution is used for upsampling. And we can obtain the results at C1, C2 and C3.
  4. Finally, they are added together, and this fuse map will have the softmax.

The basic idea is that

  1. Abstract information from higher layers helps to resolve the problem of what.
  2. Local information from lower layers helps to resolve the problem of where.

This idea has been adopted in many other deep learning frameworks.

2. Loss Function

Loss Function

The first part is just a standard regularization term using l2 norm to penalize the weights from not to be too large values, i.e. to reduce overfitting problem.

The second part is the cross entropy loss term at C1, C2 and C3.

The third part is the cross entropy loss term at the final output.

3. Boundary Refinement

Boundary Refinement

After segmentation by FCN, the segmentation boundary sometimes can be discontinued. This is due to the fusing at the end of FCN. The probability map pw(x) is having linear combination with the binary contour p(x) with the parameter wf. And this wf is determined by minimizing the rand error.

This part is relatively not so important but we may notice, boundary refinement maybe needed after FCN. Indeed, the binary contour p(x) is used at the middle of FCN for training in CUMedVision2. I hope I can cover it in the coming future.

4. Results

2012 ISBI Challenges Results

There are 3 errors to be measured (These 3 errors have been faded out in later competition):

  • Rand Error: A measure of similarity between two clusters or segmentations. For the EM segmentation evaluation, the zero component of the original labels (background pixels of the ground truth) is excluded.
  • Warping Error: A segmentation metric that penalizes the topological disagreements (object splits and mergers).
  • Pixel Error: Squared Euclidean distance between the original and the result labels.

CUMedVision Versions

  • CUMedVision-N: N means the number of FCN networks. The final output will be averaged from multiple FCNs to have better results. This technique is a kind of boosting / ensemble techniques which have been used for many years. In deep learning, such as LeNet and AlexNet, they also used this technique.
  • CUMedVision-4(C1), CUMedVision-4(C2), CUMedVision-4(C3): They have generally higher errors.
  • CUMedVision-6(With C): With 6 FCNs, it has the lowest warping error.
  • CUMedVision-4(With fusion): By fusing the results from C1 to C3, it has the smallest rand error and pixel error
  • CUMedVision-Ours: The best results from CUMedVision-4(With fusion) and CUMedVision-6(With C).

Inference time: 0.4 seconds for one test image with size 512×512.

Some Amazing Results: (Red: Inner parts look like boundary but not, Blue: Blurred but successfully segmented boundary)

If you have read about FCN used for general image segmentation, you may find that the FCN architectures are very similar. CUMedVision1 has successfully transplant the FCN network for biomedical image segmentation. If we have some tasks that need automatic segmentation, especially binary classification, we may also try this solution.


  1. [2016 AAAI] [CUMedVision1]
    Deep Contextual Networks for Neuronal Structure Segmentation

My Reviews

[LeNet] [AlexNet] [FCN]