Progressively Growing U-Nets: A new Perspective for Medical Image Segmentation

Jonas Massa
4 min readSep 18, 2019

--

Research on AI models has always been inspired by existing biological systems, particularly regarding deep neural networks. More recently, facets of human learning are being incorporated into this work. A fascinating example is the idea of curriculum learning, where the model starts by learning a simplified task and the complexity is increased in successive steps. This approach is an adaptation of ideas commonly used in the educational system, thereby drawing parallels to learning in human children. Karras et al. [2017] demonstrated superior performance of this method for image synthesis tasks with their Progressively Growing GAN (ProgGAN) framework. However, this technique has so far been applied only in a comparably small number of research fields.

Training of the ProgGAN is implemented by sequentially adding layers with increasing resolution. Deeper layers first learn meaningful features on a smaller scale before the network is grown to the next resolution. Low-resolution layers continue to be trained in the process. However, high-resolution layers can then more easily focus on fine details, which is arguably more difficult to accomplish when the whole network is trained from scratch. A smooth transition is essential when subsequent, untrained layers are added, in order to prevent collapse of the already trained layers.

Interestingly, GANs can not only be used for image synthesis, but also for segmentation. This is achieved by replacing the generator with a U-Net and employing it for supervised image to image translation. A similar approach was introduced by Collier et al. [2019]. The encoder structure of the U-Net remains fixed, whereas the decoder is grown during training. Additionally, skip-connections are implemented on every scale.

Layers are added sequentially during training.

For my master’s thesis, I employed the ProgGAN approach for the segmentation of retina vessels based on the DRIVE dataset (Staal et al. [2004]), which is of interest in the medical community for the treatment of diabetic retinopathy. The motivation behind this project was to investigate whether the high-resolution layers manage to focus on finer vessel structures and thereby make predictions with superior precision. A major challenge is the small size of the dataset, since it contains only 20 training images. Current state-of-the-art algorithms still struggle with the identification of extremely thin vessels that span just one pixel.

Raw image (Left) and Ground Truth (Right).

Implementation Details

For the objective function, standard binary cross entropy was employed. The different resolution levels of the input data were chosen as 40x40, 80x80, 160x160, 320x320 and 640x640. The model was trained on a desktop computer running Ubuntu 19.04 with a single Nvidia RTX 2070 GPU (8 GB RAM). Additionally, mixed-precision training was applied using the Apex library. The generator U-Net uses a combination of convolutional layers, batch-normalization and LeakyReLU functions. Feature maps are downsampled with max-pooling and upsampled with nearest-neighbor interpolation. For all models, I use the Adam optimizer (β₁= 0.0, β₂= 0.5) with a learning rate of 0.0002.

ProgU-Net ?

When the discriminator is removed from the training, one obtains a standalone progressively growing U-Net. As Son et al. [2017] already pointed out, a powerful discriminator can help guide the segmentation towards better results. However, there are some important issues that need to be taken into account:

  • Additional structures such as a discriminator can require a considerable amount of extra GPU memory.
  • Training tends to exhibit less stability and is generally slower.

When comparing the final results, the growing GAN shows a slight improvement over the growing U-Net approach. More interestingly though, the growing U-Net achieves a better score than its static counterpart using the same hyperparameter settings.

Qualitative results of the static U-Net (Left) and the growing U-Net (Right). Violet colored vessels indicate false negatives whereas cyan colored vessels are false positive.

Furthermore, it reduces the number of false positive predictions. In my experiments, the model achieved an AUC of 0.9771 and a Dice score of 0.8216, clearly outperforming the static version (AUC 0.958, Dice score 0.7307).

Conclusion

The aim of the present work was to examine the applicability of progressive growing for biomedical image segmentation, specifically concerning fine retina vessel structures. Although the ProgGAN approach did not outperform the static GAN employed by Son et al. [2017], the progressively growing U-Net demonstrated considerably improved performance in comparison to plain U-Nets listed in other papers. To my knowledge, no previous work has focused on this particular aspect. Due to its properties of fast and stable training, I consider the growing U-Net a promising new approach that warrants further research. It is still unclear whether growing can generally improve existing U-Net frameworks of if the hyperparameter settings are in conflict with this method. The focus of future work on this matter should therefore be the investigation of the performance of growing U-Nets in various other image segmentation tasks.

--

--