Simultaneous Segmentation and Classification of Nuclei in Multi-Tissue Histology Images

Published in

Red Buffer

5 min readMay 1, 2021

In the previous article, we gave a general overview of the role of AI in Computational Pathology. We discussed the importance of identifying Cells(Nuclei) that helps us carry out morphometric analysis used in extracting Digital Biomarkers and carrying out downstream analysis.

The focus of this article is on a detailed explanation of the Hover-Net (SOTA) model that performs the Nuclei Instance Segmentation.

Network Architecture

The network architecture of Hover-Net comprises of Encoder-Decoder architecture that is fairly common in segmentation models. The input image size of the network is 270x270x3 where 270 is the height/width of the input patch with 3 being RGB channels of input Images. The Encoder-Decoder Hover-Net is explained in the sections below.

Encoder Block

The purpose of the Encoder Block in a segmentation model is to perform feature extraction on the input image. As we go through the Encoder Block the spatial resolution of feature maps decreases and the depth of the feature maps increases.

The arrangement of the layers in the Encoder block of Hover-Net are as follows:

Conv(7x7) -> Residual Block(x3,m=256) -> Residual Block(x4,m=512) -> Residual Block(x6,m=1024) -> Residual Block(x3,m=2048) -> Conv(1x1)

The Encoder's first layer comprises of Convolution Layer having a filter size of 7x7 to have a large field of view. This is followed by batch normalization and ReLU activation to normalize and apply non-linearity to make the output feature map differentiable.
Next comes the Preact-ResNet 50 which consists of 4 Residual Blocks having layers ranging from 3 to 6.
The last layer of the Encoder Block comprises of 1x1 Convolution that is mostly used to reduce the depth of the feature map in a neural network.

Decoder Block

The objective of using a Decoder block is to upsample the features extracted from the Encoder Block. There are three branches in the Decoder block which are dedicated to Nuclear Pixel (Segmentation of Nuclei), HoVer Branch (Generation of Horizontal), and Vertical Feature Maps which are used for detection of overlapping nuclei, Nuclear Classification (Assigns class label to segmented nuclei).

All the three branches of Decoder Block are similar to each other apart from the last layer which is task-specific.

The arrangement of the layers in the Encoder block of Hover-Net are as follows:

Upsample -> Conv(5x5) -> Dense Decoder Unit(x8) -> Conv(1x1) -> Upsample -> Conv(5x5) -> Dense Decoder Unit(x4) -> Conv(1x1) -> Upsample -> Conv(5x5) -> Conv(1x1)

The Upsample layer increases the size of the feature map by a factor of 2.
The Dense Decoder Unit block consists of 2x pre-act convolution layers and these blocks are stacked 8 times and 4 times respectively in the decoder branch.
The 1x1 convolution layer is used to reduce the depth of the upsampled feature maps.

Loss Function

Hover-Net Loss function consists of 3 terms: HoVer Branch (Horizontal and Vertical Feature Maps), Nuclear Pixel Branch (NP), and Nuclear Classification Branch (NC). These loss functions are explained in detail in the subsections below.

The value of λa…λf are the weights associated with their respective losses.

HoVer Branch Loss

La is given as the mean squared error between the predicted horizontal and vertical distances and the GT. Lb is given as the mean squared error between the horizontal and vertical gradients of the horizontal and vertical maps respectively and the corresponding gradients of the GT.

NP Branch Loss and NC Branch Loss

NP and NC branches, we calculate the cross-entropy loss (Lc and Le) and the dice loss (Ld and Lf ). These two losses are then added together to give the overall loss of each branch.

Cross-Entropy Loss and Dice Loss for NC and NP Branch

Evaluation Metrics

Hover-Net evaluation is carried out for two types of metrics:

Nuclei Instance Segmentation

The evaluation metrics used for Instance Segmentation in Hover-Net are:

Nuclei Classification

F-Score

Datasets

Hover-Net model has been trained and validated against 6 different Datasets. These datasets are summarized in the table below.

Dataset used for Training and Validation for Hover-Net Model

Model Hyper-Parameters

Hover-Net model hyperparameters used for training and testing can are listed below:

Image Size 270x270 (252x252)
Data Augmentation: Flip, Rotation, Gaussian Blur, and Median Blur
50 Epochs Training Using Pre-Trained Weights of Preact-Resnet 50
50 Epochs for Fine-Tuning All Layers.
Adam Optimizer with Learning Rate 0.0001
Reduction of Learning Rate to 0.00001 after 25 Epochs

Quantitative Results

The results of Hover-Net for Nuclei Instance Segmentation and Nuclei Classification can be seen in the sections below.

Nuclei Instance Segmentation

The table below shows the results of various instance segmentation models on different datasets. It can be seen in the table below that all Hover-Net stands out in terms of performance against its counterparts.

Comparative results of various Instance Sgemenatrion Models

Nuclei Classification

The table below shows the comparison of the Nuclei Classification performance of Hover-Net for CoNSeP and CRCHisto datasets. We can see that again Hover-Net is performing better than other models.