Deep Gray Matter (DGM) Segmentation using 3D Convolutional Neural Network: application to QSM (Part 1)

Zhe Liu

Published in

Towards Data Science

6 min readOct 20, 2017

[Update 2018–02–04]: new results can be found at Part 2.

Code

Available at Github: https://github.com/zl376/segDGM_CNN

Motivation

Automated segmentation by computered is desired and indeed widely studied in MRI analysis. Specifically, for QSM (Quantitative Susceptibility Mapping), an MRI technology which quantifies and delineates iron/calcium distribution in the subject, region of interest (ROI) measurement for deep gray matter (DGM) is a well accepted metric for agreement between different reconstruction algorithms. The DGM is usually selected as target region due to their richness in iron deposit, which manifests as bright signal on QSM.

Deep Gray Matter includes: Basal ganglia (Globus pallidus, Putamen and Caudate nucleus), subthalamic nucleus and substantia nigra.

Usually, the drawing of the DGM ROI is performed manually by experienced radiologists, which takes a painful amount of time and inherently limits the sample size of the analysis (<20 subjects). Therefore, I try to train a neural network to segment the DGM based on QSM, for ROI measurement analysis at a potentially large scale (e.g. ~500 subjects).

Method

Convolutional Neural Network is chosen as the framework for segmentation. The model used in this work is based on:

The current CNN is implemented with the following platform/package:

Tensorflow
Python 3.5
Keras 2.0.2

Data Pre-processing

QSM from n=20 subjects are collected and denonymized for this work. ROIs for distinct deep gray matter (DGM) are drawn on each case by an experienced expert and are taken as the reference segmentation.

Example for QSM (left) and ROI (right) for DGM

The raw outcome of QSM reconstruction spans a range of matrix size, resolution, and converage (field of view), so the data need to be normalized before input to the network.

Data normalization

All 20 three-dimensional images are re-sized to the same voxel size (1mm, 1mm, 3mm) and cropped to matrix size (160, 220, 48). This provides a volume coverage of (16cm, 22cm, 14.4cm), large enough for average human brain. The susceptibility value for QSM is cropped to the range of -0.2ppm to 0.2ppm, and re-scaled to 0~255. 11 classes are extracted from the manual segmentation: 1 class for background, 2 classes (Left/Right) for each of the 5 DGM. Processed images are saved as Nifti file and fed to the following CNN.

15 cases are denoted as training set, and 5 for testing.

Training the CNN

Overview of the 3D CNN, as proposed by Dolz et al. Neuroimage 2017.

As suggested in the solution for iSeg2017, the input 3D volume (both QSM and class label) is segmented into smaller patches (27, 27, 27) which corresponds to output patch size (9, 9, 9), and those with mere background label are discarded from the training. This trick dramatically reduces the data size and hence the memory use for training. For this work, the slice thickness here (3mm) is 3-fold larger than theirs (1mm), so the input patch size is chosen as (27, 27, 21) which corresponds to an output size (9, 9, 3).

1/4 of the input data is chosen for validation. Cross-entropy is chosen as the loss function.

Fine-tuning against false-positives

Since some patches are discarded before training, the information in those area has not been utilized by the CNN yet. After training the CNN, if those patches are passed through the network and compared with the reference label (validation), it is observed that background class might be mis-classified as DGM, which is characterized as false-positive.

Those false-positive are dealt with a second round fine-tuning of the CNN: those patches with false-positive are included in the input collection; the network then is re-trained using the augmented input data, based on the previous obtained “optimal” weights.

Parameter selection

Learning rate is assigned as 0.0001, then reduced to 0.00001. Max epoch number is 40 and early stopping is triggered when validation loss stops decreasing.

Post-processing

As noticed in the large-scale study paper, the segmentation may consist some isolated small “islands” other than the targeted DGM structure. This is dealt with keeping the largest connected component in the segmentation for each DGM ROI.

Result and Evaluation

The precision (true positive vs. all positive prediction) and dice score for each DGM are calculated and shown as below.

One good example:

One not-so-good example:

Under-estimation of specific DGM (especially for substantia nigra and caudate necleus) is observed in the results. This is also indicated as low Dice score in the table for these two DGM.

Discussion

Sample size

In this work, sample size is kind of small (15 training + 5 testing), but already gives some promising result as shown above. The expensive 3D convolution and memory required to hold all 3D patches are major hindrance towards the use of a large sample size. On the current training platform (Core i3, 16GB RAM, GTX 1080Ti with 11GB memory), the fine tuning stage occupies ~50% of the GPU memory. Potentially, the sample size could be doubled (to ~40) and cross-validation can be employed to further evaluate the performance of the CNN.

Loss function

In the current training, categorical cross-entropy is used as loss function. While other metric such as Dice score are widely used to quantify ROI agreement, it is interesting to see whether replacing cross-entropy with Dice score boosts the CNN.

Under-segmentation

When applied on the 5 test cases, the CNN sometimes fails to depict the entire structure for some DGM region, especially for caudate nucleus. This might be related to the kernel size used in the convolutional layers.

This begs another interesting question: is under-segmentation really a problem for ROI analysis, especially for comparison of different QSM reconstruction algorithms?

Multi-class vs. Binary ?

Given that DGM are anatomically and functionally connected, at least for basal ganglia where interfaces between globus palldus, putamen and caudate nucleus are challengingly difficult to determine, a multi-class segmentation might not be the optimal framework for this specific task. Meanwhile, the entire task could potentially be divided into smaller sub-tasks, in which we segment each ROI using a binary classifier: For example, in segmenting the left globus pallidus, the voxel in the target structure is labeled as class 1 and everything else is denoted as class 0. The training speed could benefit from this task division, since for each sub-task (ROI), the number of patch involved is much lower than the one in the original multi-class task. When all sub-tasks are carried out simultaneously, the training process is accelerated at a factor of the number of ROIs.

Multi-modality input

In the solution for iSeg2017, both T1 and T2 weighted image were used as input for each subject. Different contrast displayed in T1 and T2 weighted images might be helpful in distinguishing different structures. Whereas in QSM study, the T1/T2 images are typically scarce or, if any, unregistered since they are acquired in different sessions. However, other image modality such as magnitude (single echo) or R2* map (demonstrating how fast signal decays across multiple echoes) is readily available with each QSM map and can be incorporated into the current CNN. Its performance with multi-modality input data will be an interesting topic to explore.