Review: Attention U-Net — Learning Where to Look for the Pancreas (Biomedical Image Segmentation)

With Attention Gate (AG), the model automatically focus to learn the target structures of varying shapes and sizes.

Sik-Ho Tsang
Aug 27 · 5 min read

In this story, Attention U-Net, by Imperial College London, Nagoya University & Aichi Cancer Center, University of Luebeck, HeartFlow, and Babylon Health, is briefly reviewed.

  • With Attention Gate (AG), the model automatically focus to learn the target structures of varying shapes and sizes.

This is published in 2018 MIDL with more than 40 citations. (Sik-Ho Tsang @ Medium)


  1. Attention U-Net
  2. Analysis
  3. Experimental Results

1. Attention U-Net

Top: Attention Gate (AG), Bottom: Attention U-Net

1.1. Framework

  • As the same as U-Net or 3D U-Net, we got contraction path at the left and expansion path at the right.
  • Contraction path: a series of conv and max pooling to extract local features.
  • Expansion path: A series of upsampling and conv for global feature.
  • And there are concatenation of feature maps at the same level using skip connection.
  • But different from U-Net (same as 3D U-Net), 3D conv is used because the input image is 3D CT image.
  • Another difference is that at the skip connection, there is an Attention Gate (AG) at each level.

1.2. Attention Gate (AG)

  • The details of AG is as shown at the top of the above figure.
  • First, the two input feature maps are having individual 1×1×1 conv, then added together and ReLU.
  • Second, 1×1×1 conv is performed again, but with Sigmoid as activation function this time.
  • As sigmoid has the ranges of [0,1], it is just like a mask.
  • Unlike Residual Attention Network or SENet which are channel-wise or class-wise mask, it is a voxel-wise mask.
  • After sigmoid, it goes through the resampler, which is actually the trilinear interpolation, to make the feature map sizes the same as the one to be element-multiplied.
  • Finally, concatenation is performed with the upsampled feature maps at the lower level.

2. Analysis

  • From left to right (a-e, f-j): Axial and sagittal views of a 3D abdominal CT scan, attention coefficients, feature activations of a skip connection before and after gating.
  • Similarly, (k-n) visualise the gating on a coarse scale skip connection.
  • The filtered feature activations (d-e, i-j) are collected from multiple AGs, we can see that a subset of organs is selected by each gate.
The attention coefficients across different training epochs (3,6, 10, 60, 150).
  • As the above figure, we can see that the model gradually learns to focus on the pancreas, kidney, and spleen during training.

3. Experimental Results

3.1. Datasets

  • CT-150: 150 abdominal 3D CT scans acquired from patients diagnosed with gastric cancer (stomach cancer)
  • CT-82: 82 contrast enhanced 3D CT scans with pancreas manual annotations performed slice-by-slice, which is TCIA CT Pancreas benchmark (61 train, 21 test).

3.2. Comparison with U-Net

Comparison with U-Net on CT-150
  • (120/30): 120 images for training, and 30 for testing.
  • (30/120): 30 images for training, and 120 for testing.
  • With the above two settings, Attention U-Net consistently outperforms U-Net with higher Dice Score Similarity (DSC) for different organs.
  • The inference time is just a bit longer compared with U-Net.
  • Since Attention U-Net has more parameters than U-Net without AGs, authors add more channels to the U-Net to make the number of parameters close to Attention U-Net, in which their results are shown above.
  • DSC is not as good as the one in Attention U-Net.
  • Also, the inference time is even longer as well.

3.3. Fine-Tuning or Training From Scratch

TCIA CT Pancreas benchmark dataset
  • Initially, the models
  • trained on CT-150 dataset are directly applied to CT-82 dataset to observe the applicability of the
  • two models on different datasets
  • BFT:Before Fine Tuning, Attention U-Net outperforms U-Net.
  • AFT: After Fine Tuning, Attention U-Net still outperforms U-Net.
  • SCR: When training the models from scratch, Attention U-Net still outperforms U-Net.

3.4. Comparison with State-of-the-art Approaches

Indirect Comparison with State-of-the-art Approaches
  • With only few additional parameters, i.e. AGs,
  • With the use of single model,
  • Without any cascaded U-Nets within the model (many parameters),
  • Without any post-processing,
  • Attention U-Net got 81.48 ± 6.23 on CT-82, which has better or comparative results with other SOTA approaches.

3.5. Visualization

  • (a): Ground-truth pancreas segmentation is highlighted in blue.
  • (b): Ground-truth pancreas segmentation
  • (c): U-Net model prediction. The missed dense predictions by U-Net are highlighted with red arrows.
  • (d): Attention U-Net prediction.


[2018 MIDL] [Attention U-Net]
Attention U-Net: Learning Where to Look for the Pancreas

My Previous Reviews

Image Classification [LeNet] [AlexNet] [Maxout] [NIN] [ZFNet] [VGGNet] [Highway] [SPPNet] [PReLU-Net] [STN] [DeepImage] [SqueezeNet] [GoogLeNet / Inception-v1] [BN-Inception / Inception-v2] [Inception-v3] [Inception-v4] [Xception] [MobileNetV1] [ResNet] [Pre-Activation ResNet] [RiR] [RoR] [Stochastic Depth] [WRN] [ResNet-38] [Shake-Shake] [FractalNet] [Trimps-Soushen] [PolyNet] [ResNeXt] [DenseNet] [PyramidNet] [DRN] [DPN] [Residual Attention Network] [DMRNet / DFN-MR] [IGCNet / IGCV1] [MSDNet] [ShuffleNet V1] [SENet] [NASNet] [MobileNetV2]

Object Detection [OverFeat] [R-CNN] [Fast R-CNN] [Faster R-CNN] [MR-CNN & S-CNN] [DeepID-Net] [CRAFT] [R-FCN] [ION] [MultiPathNet] [NoC] [Hikvision] [GBD-Net / GBD-v1 & GBD-v2] [G-RMI] [TDM] [SSD] [DSSD] [YOLOv1] [YOLOv2 / YOLO9000] [YOLOv3] [FPN] [RetinaNet] [DCN]

Semantic Segmentation [FCN] [DeconvNet] [DeepLabv1 & DeepLabv2] [CRF-RNN] [SegNet] [ParseNet] [DilatedNet] [DRN] [RefineNet] [GCN] [PSPNet] [DeepLabv3] [ResNet-38] [ResNet-DUC-HDC] [LC] [FC-DenseNet] [IDW-CNN] [DIS] [SDN]

Biomedical Image Segmentation [CUMedVision1] [CUMedVision2 / DCAN] [U-Net] [CFS-FCN] [U-Net+ResNet] [MultiChannel] [V-Net] [3D U-Net] [M²FCN] [SA] [QSA+QNT] [3D U-Net+ResNet] [Cascaded 3D U-Net] [Attention U-Net]

Instance Segmentation [SDS] [Hypercolumn] [DeepMask] [SharpMask] [MultiPathNet] [MNC] [InstanceFCN] [FCIS]

Super Resolution [SRCNN] [FSRCNN] [VDSR] [ESPCN] [RED-Net] [DRCN] [DRRN] [LapSRN & MS-LapSRN] [SRDenseNet]

Human Pose Estimation [DeepPose] [Tompson NIPS’14] [Tompson CVPR’15] [CPM]

Codec Post-Processing [ARCNN] [Lin DCC’16] [IFCNN] [Li ICME’17] [VRCNN] [DCAD] [DS-CNN]

Generative Adversarial Network [GAN]

Sik-Ho Tsang

Written by

PhD, Researcher. I share what I've learnt and done. :) My LinkedIn:

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade