Don’t Just Scan This: Deep Learning Techniques for MRI

Published in

Stanford AI for Healthcare

9 min readFeb 7, 2018

Written by Nicholas Bien

Deep learning will soon help radiologists make faster and more accurate diagnoses. (image source)

Over the last decade, the ability of computer programs to extract information from images has increased tremendously. We owe most of this advancement to convolutional neural networks (CNNs), a type of neural network specialized for processing image data. CNNs have consistently outperformed classical machine learning (ML) techniques (e.g. support vector machines, random forest, k-nearest neighbors) since 2012, when AlexNet won the ImageNet Large Scale Visual Recognition Competition (Krizhevsky et al. 2012). Most of the work of designing a classical ML algorithm lies in choosing appropriate features. In contrast, a deep neural network takes raw input (possibly after some preprocessing) and automatically learns features through training. It is thus essential to understand what data and architectures are best for the task at hand.

With that in mind, this article attempts to:

provide the necessary medical background and preprocessing tips for deep learning research on MRI
introduce CNN architectures for segmentation
survey current state-of-the-art MRI segmentation methods
discuss remaining challenges to improving deep learning models for MRI

MRI Basics

Magnetic resonance imaging (MRI) is an advanced imaging technique that is used to observe a variety of diseases and parts of the body. MRI’s unrivaled soft-tissue contrast makes it useful for detecting abnormal tissue known as “tumors” or “lesions”. Radiologists use MR exams, each of which consists of a series of cross-sectional gray-scale images, to diagnose disease, quantify tissue growth or atrophy over time, and guide surgical procedures. As we will see later, neural networks can analyze these images individually (as a radiologist would), or combine them into a single 3D volume to make predictions.

At a high level, MRI works by measuring the radio waves emitting by atoms subjected to a magnetic field. The appearance of tissue in an MRI depends on the tissue’s chemical composition and which particular MR “sequence” is employed. The most common of sequence is T2-weighted MRI, in which tissues with more water or fat appear brighter due to their relatively high number of hydrogen atoms. In contrast, bone (as well as air) has low signal and appears dark on T2-weighted images. For brain MRIs, T1-weighted, T1-weighted with gadolinium contrast enhancement (T1-Gd) and Fluid Attenuated Inversion Recovery (FLAIR) are commonly used sequences along with T2-weighted images (Isin et al. 2016). In prostate cancer diagnosis, a different combination of sequences called multi-parametric MRI (mpMRI) is used (Sarkar et al. 2016). Determining which sequences to use for a given disorder or body part requires careful research or radiological expertise.

Brain MRIs labeled by sequence type. The far right image is a radiologist‘s segmentation. (Havaei et al. 2016)

The deep learning task

Algorithmic methods for MRI analysis fall into two general categories: classification and segmentation. Classification assigns a label to an MRI series — normal/abnormal, level of severity, or a diagnosis. Segmentation is the process of delineating the boundaries, or “contours”, of various tissues. The common practices for data preprocessing, which we cover next, apply to both tasks. However, most research to date has focused on segmentation, so that will be the focus of this article rest of the article.

MRI preprocessing

Most of the best models for MR imaging apply some preprocessing to their images before feeding them into a neural network. Some common preprocessing steps include:

Registration. If the patient moves during an MR screening, images may be offset from one another. If different sequences are combined in a single channel, or if a 3D network is used, then the images must first be aligned to a common orientation. In datasets hosted for public challenges, this is typically done before the data is released.
Bias field correction. MRI images are affected by bias field distortion, which causes the intensity to vary even across the same tissue (Pereira et al. 2016). The N4ITK method (Tustison 2011) is the most common method for correcting this.
Normalization. Due to the nature of MRI, even images of the same patient on the same scanner at different can have different intensities. Many MRI segmentation models use an intensity normalization from Nyul et al. (2000) to alleviate this problem. Additionally, as is typical with CNNs, each input channel (i.e. sequence) is normalized to have zero mean and unit variance within the training set.

CNNs for Segmentation

While a CNN for classification outputs the probability of the entire image belonging to each class in question, a CNN for segmentation assigns a label to each pixel (or “voxel” if the image is 3D). An early approach to segmentation was to run patches of pixels, centered around the pixel of interest, through a CNN classifier (Ciresan et al. 2012). Doing this for each pixel produces the segmented image. Following Akkus et al. (2017), we will call this “patch-wise” segmentation. Although relatively simple, this approach is currently one of the most common for MRI segmentation tasks.

The goal of semantic segmentation is to predict a class for each pixel. The architecture above shows the “patch-wise” approach (Ciresan et al. 2012).

In 2014, Long et al. (2014) introduced fully convolutional networks (FCNs) for semantic segmentation. This network utilizes an encoder-decoder structure, where input image passed first through layers of convolution then through layers of upsampling to produce pixel predictions of the same size as the original input. SegNet (Badrinarayanan et al. 2015) improves upon the original FCN, and achieves state-of-the-art performance today (Garcia-Garcia et al. 2017). Since the encoder-decoder architecture generates all pixel predictions at once, it is faster than patch-wise segmentation. An additional advantage is that a pre-trained CNN for classification can be used for the encoder portion of the network (however, the lack of large MRI datasets makes this less useful for MRI segmentation).

Example of encoder-decoder segmentation by SegNet (Badrinarayanan et al. 2015)

Other segmentation architectures attempt to explicitly solve the problem of local dependencies while still using a patch-wise network. For example, in a cascading architecture, one CNN outputs class probabilities for each pixel, which are then fed to a second CNN for the final segmentation. Another approach is to apply conditional random fields or other probabilistic models during post-processing.

The principles of 2D segmentation carry over well to 3D segmentation, but with an increase in computational complexity. Because of this, 3D volumes are sometimes segmented slice-by-slice by a 2D CNN then reconstructed. V-Net (Milletari et al. 2016) is a good example of a fully convolutional 3D segmentation network, which happens to be for segmenting prostate MRIs. The medical imaging field would particularly benefit from techniques for making 3D segmentation algorithms more efficient.

State-of-the-art MRI Segmentation

The most established field for automatic MRI algorithms is brain tumor segmentation. In this field, the BRATS 2013 challenge dataset has become a benchmark for model comparison. Segmentation models are evaluated using Dice scores, which is calculated as:

where P represents the segmented area and T represents the ground truth area. Dice scores range from 0 to 1, where a score of 1 represents perfect segmentation. Each model in the BRATS challenge receives three Dice scores, one for each part of the tumor (whole, core, and active).

Isin et al. (2016) summarizes the best models on the BRATS 2013 dataset to date. We analyze the two best automatic (as opposed to semi-automatic) models, as well as the best 3D model. The models, along with their dice scores, are as follows (for reference, a human rater scored 0.88, 0.93, 0.74):

Pereira et al. 2016: 0.88, 0.83, 0.77
Havaei et al. 2017: 0.88, 0.79, 0.73
Urban et al. 2014: 0.87, 0.78, 0.74

In all of the models, the last dimension of the input is for sequences rather than colors. All applied some preprocessing, with Pereira et al. being the most extensive. All three models also made use of a post-processing technique whereby connected components below a certain size threshold are removed from the prediction. Pereira et al. augmented the data using 90º rotations, while Havaei et al. and Urban et al. did not find data augmentation helpful.

Havaei et al. chose not to pursue a 3-dimensional network because the spacing between between MRI slices wasn’t consistent. However, Urban et al. found comparable results with a relatively simple 3D patch-wise network. Pereira et al. introduced smaller filter sizes in the convolutional layers in order train a deeper (11-layer) network. Their network also adds dropout to the fully connected layers to reduce overfitting. The architecture of Havaei et al. is the most complex, with two cascading CNNs, one for generating class probabilities and one for generating final predictions. In dealing with unbalanced class sizes — roughly 98% of the pixels are healthy — Havaei et al. first train with an equal number of healthy and unhealthy patches, then re-train only the output layer with a more representative distribution. (A more common way to handle this is to weight misclassification of unhealthy pixels more than misclassification of healthy pixels.) How the strengths of these and other models can be combined remains to be explored.

Remaining Challenges

So far we have discussed many technical difficulties of working with MRI data, like how to normalize images, how to manage class imbalance, and how to handle multiple sequences. However, as deep learning models improve to the point of human performance in many medical imaging tasks, we will run into two bigger problems:

Data isn’t readily available in large quantities. Most online datasets for medical imaging contain hundreds of images, whereas ImageNet has 14 million. More collaboration between healthcare and tech is needed to establish a better pipeline for data acquisition.
What is ground truth? Radiologists often disagree significantly on the segmentation or diagnosis called for by an MRI. Deep learning models can often deal with random variability in ground truth labels, but any systemic bias in radiology will persist in deep learning models trained on radiologists’ predictions. On the flip side, a deep learning system that exceeds human performance could inform new discoveries in radiology, just as medical imaging problems are stimulating new developments in deep learning today.

I am extremely grateful to Matthew Lungren MD MPH, Assistant Professor of Radiology at the Stanford University Medical Center, and Bhavik Patel, MD, MBA, Assistant Professor of Radiology at the Stanford University Medical Center, for their guidance and valuable feedback. I would also like to thank Pranav Rajpurkar, Jeremy Irvin, Shubhang Desai, and Tanay Kothari of the Stanford Machine Learning Group for their comments.

Don’t Just Scan This: Deep Learning Techniques for MRI

MRI Basics

The deep learning task

MRI preprocessing

CNNs for Segmentation

State-of-the-art MRI Segmentation

Remaining Challenges

Further Reading

Brain MRI

Prostate MRI

Cardiac MRI

Breast MRI

Knee MRI

Written by Nicholas Bien