Semantic segmentation to detect Nuclei using U-net

Prediction done on the Data Science Bowl 2018 data set using Tensorflow and Keras

Published in

Analytics Vidhya

5 min readSep 16, 2019

Introduction

This article shows usage of U-net convolutional neural network for detection of nuclei for the Data Science Bowl 2018 kaggle dataset. Tensorflow/Keras frameworks are used for the implementation of the model, training and prediction procedures. Full code for this article is available on the Github.

Data set overview

The Data Science Bowl 2018 kaggle dataset contains a large number of segmented nuclei images. The images were acquired under a variety of conditions and vary in the cell type, magnification, and imaging modality (brightfield vs. fluorescence).

The data set is represented by a set of folders. Each of the folders contains 2 sub-folders: “images” and “masks”.

Sub-folder “images” contains a single file which is an image of a tissue scan. Sub-folder “masks” contains several png files — one file per annotated nucleus. A mask file is a black-white picture with the size of the original tissue scan, where white pixels depict the location of the nucleus. The picture below shows a sample of the data. The original image is in the left top corner, the rest are mask images (only a few of the masks are shown).

Scan images vary in color, magnification, and size. Below are some samples of the images (images are scaled to fit the document):

Data set preparation

Image sizes

Convolutional neural networks work with images of the same size. The Data Science Bowl 2018 kaggle dataset contains images of different sizes. The first challenge is to bring all dataset images to the same size. One of the possible approaches is the resizing of the images to some standard size. This will lead to losing some information when downsizing and increasing of calculation complexity by upsizing. It is even more questionable because images have different width/height ratios. Usage of different scaling coefficients along x and y-axis will distort the images.

In this article, we use another approach. The smallest size of an image (256x256 pixels) is taken as a basis. All images are decomposed by sets of tiles with the basis size. An original 256x256 image is represented by a single tile. Images with larger size are decomposed by overlapped 256x256 tiles. So an image with size 320x320 will be represented by 6 tiles as shown in the picture below:

Overlapping of tiles brings another positive effect: it increases the size of the training set.

Image brightness

To bring all images to the same color space, color images are converted to the grayscale.

Another challenge is that the dataset contains images of different brightness. Images with high brightness show nuclei as dark spots, images with low brightness show nuclei as bright spots. See picture below:

To mitigate the difference, we invert bright images.

Training, validation and test sets

To achieve equal distribution of images with different level of brightness and the density of nuclei, we divide images by several clusters according to their raster statistics ( min, max, mean and standard deviation of pixel values). For details see Dataset notebook on Github. Training, validation and test datasets are created with equal proportions of different clusters. The image below shows several rows of images. Each row represents a cluster.

U-net architecture

Originally U-net neural network architecture was proposed here: U-Net: Convolutional Networks for Biomedical Image Segmentation. In the original version of the network input and output images have different sizes. In this article, we use a symmetrical architecture (https://github.com/zhixuhao/unet). The symmetry is achieved by using of ‘same’ padding instead of ‘valid’ in the original model.

Training

Training of the model was done using Adam optimizer with learning rate 1.e-4. The picture below shows the evolution of the prediction masks generated during the training on different steps. The last row shows manual annotation.

Pictures below show how loss, accuracy and pixel difference was changed during the training of the model.

Results

After 3 epochs 4000 steps each, the training loss dropped to 0.006, validation loss dropped to 0.14. Accuracy of the training dataset achieved 0.9975, the accuracy of the validation dataset — 0.98. The picture below shows the result of prediction of several images from the test dataset (the first row is an image, second — prediction, third — annotation) :

The following picture shows the deviation of results from annotations for the images with the largest error (more than 6% of wrongly classified pixels). The first row contains original images, second — predicted results, third — annotation mask, fourth — pixel difference between predicted and annotated images:

Despite a relatively large amount of pixel different in annotation and predicted mask, the quality of the generated segmentation is not so bad. Most differences appear on diffuse nucleus border and overlapping nuclei. Nevertheless, all nuclei were detected.