Superman isn’t the only one with X-Ray vision: Deep Learning for CT Scans

Published in

Stanford AI for Healthcare

10 min readFeb 5, 2018

Superman performing a CT scan in the pre-deep learning era. [Source]

If you were going to eat an apple and were doubtful if it is rotten from the inside or not, you could cut it up and see if there’s anything wrong on the inside. However, if you wanted to inspect a person, it’s neither easy nor feasible to treat them as rotten apples. And that’s where CT scans come to the rescue.

A computerized tomography (CT) scan consists of a series of X-ray images taken from different angles and combines them to create cross-sectional images, or slices, of the bones, blood vessels and soft tissues inside your body. These are much more detailed than single plain X-rays.

This article will talk about details about what CT scans are used for, how to use CT data for machine learning and other research purposes, the current work done in deep learning with CT scans, the challenges faced by researchers, and other avenues for exploration.

Computer tomography of human brain, from base of the skull to top. [Source]

What are CT Scans used for?

A CT scan generates X-ray images of slices of the body to allow radiologists to analyze the interior of the body without a surgery procedure.

A CT scan has many uses, but is particularly well-suited to quickly examine people who may have internal injuries from car accidents or other types of trauma. A CT scan can be used to visualize nearly all parts of the body and is used to diagnose disease or injury as well as to plan medical, surgical or radiation treatment.

A series of CT scans converted into an animated image using Photoshop [Source]

Since CT scans consist of slices which amalgamate a 3D view, the advent of this technology has revolutionized the healthcare industry by now allowing radiologists to analyze the body in high-resolution 3-dimensional representations. These provide an immense amount of information which is used to diagnose a variety of diseases in different parts of the body. This extends from detecting tumor growth and metastases to internal injuries, bleeding, clots, malformations, and fractures — and providing far better diagnoses than was earlier possible with individual X-rays.

Data from CT Scans

Hounsfield Units

CT scans are generally available in DICOM files, which each contain 2D arrays with pixel intensities. These however aren’t in the standard 0–255 range, but are instead in Hounsfield Units or CT Numbers.

A DCM file is an image file saved in the Digital Imaging and Communications in Medicine (DICOM) image format. It stores a medical image, such as a CT scan or ultrasound, and may also contain information about the patient.

Hounsfield Units (HU) measure the radio-intensity of particular mediums based on their attenuation coefficients.

Attenuation coefficients give a measure of how easily X-rays can pass through a particular medium. It quantifies how much the beam is weakened as it passes through the given material.

They range from -1000 HU for air, to 0 HU for distilled water, and >10,000 HU for metals. In CT scans, we usually care about about the -700 to +100 range where

Lungs occur from -700 to -600
Muscles are between +35 and +55
Blood is +13 to +50, while clotted blood is +50 to +75
Other organs and matter are around +10 to +100
Anything above +500 are foreign bodies like metal, glass, or rocks.

Once we clamp our input to a particular intensity window range as a preprocessing step, we can normalize the resulting data to 0–255 and produce an image which can be used to train machine learning models.

Data from a single trial

A trial contains all the data collected from a single visit by a patient for a procedure. In case of a CT scan, this would contain multiple DCM files, each containing an image of a single slice.

Each CT scan acquires images along the axial plane — these are typically also used to produce visualizations along the coronal and sagittal orientations.

Single axial slice of a chest CT scan [Source: Radiopaedia]

Single slices from CT scans along the Coronal and Sagittal orientations of the chest

This data can be used in multiple ways:

Process one image at a time: This loses the 3D detail that CT scans provide, however, this has shown to have good performance for analyzing nodules in the lungs for detecting lung cancer.
Processing images in a series: Multiple images of a series along any particular orientation can be used together. This is used by all the current state of the art models.
Processing multiple series: This is more computationally expensive and usually only provides marginal improvement over other methods. It involves processing all the images in a series, and using multiple series with an ensemble of models to process each one of them.

Deep Learning for CT Scans

Depending on how we process the data, we can have different models:

Single image processing: This is done using simple 2D convolutional networks like DenseNet and ResNet. These can either be used in an end-to-end pipeline to directly produce the output, or it can be used to process and extract features which are then passed into another model.
With enough data, the end-to-end models have been shown to be much better at these tasks than having separate models.
Series / Multi-image processing: These images can either be stacked together to create a ‘3D image (or 3D point cloud)’. This can then be processed by 3D convolutional networks.
Another way to process this data is to look at it sequentially: by passing these through a recursive neural network like an RNN or LSTM with hidden layers from pre-trained models like DenseNet.

Current Work

2D Convolutional Neural Networks were used as a standard in Lung Cancer Detection from individual CT scans. Some examples of this can be seen in this Github Repo.
Historically, CT scans have been used in conjunction with PET scans to detect lesions for lung cancer. Krzysztof Pawełczyk et al., 2017, show that they can detect and segment lesions based on only CT scan data using 3D Convolutional Networks.
The work also details a dynamic way of managing class imbalance in the dataset, which is very common for medical datasets. After every few epochs, accuracy is calculated on the validation set for all classes. Then a new training subset is created where classes with lower accuracy are present in higher proportions.

The architecture utilizing a 3D convolutional layer to process the set of sequential images

In the recent Lung Nodule Detection Challenge (LUNA16), all of the top submissions use CNNs to achieve good performance. One crucial factor in their success was the availability of expert knowledge. Understanding details about localization, processes used by radiologists, and other relevant medical expertise allows for model design choices which greatly improve performance on the task. As seen in a lot of competitions, most of the teams use similar models and techniques, namely pre-trained CNN architectures. However, it is the preprocessing and data augmentation with expert knowledge that gives teams the edge. Sample code to work with the LUNA16 dataset: Github Repo
A recent work from Monika Grewal et al. from Parallel Dots, published on Jan 2018, talked about the model which they called RADNet (Recurrent Attention DenseNet). It uses a DenseNet architecture to extract features, in addition to passing sequential data through a bi-directional LSTM layer. It uses the context around each image in the series to make better predictions.

Segmentation remains one of the major tasks in CT scans. Patrick Christ et al. [11] [12] have shown that they can get much better performance on liver and tumor segmentation using Conditional Random Fields (CRFs) and Cascaded Fully Convolutional Networks (CFCNs). CRFs have been one of the most popular algorithms used for segmentation in medical imaging for the past decade, and continue to be used along with its 3D variations. Other architectures like U-Net [13](for 2D image slices) and V-Net [14](for 3D image volumes) have shown remarkable performance in image segmentation by using context from multiple resolutions to provide a final segmentation map.

Challenges

As discussed in A Survey on Deep Learning in Medical Image Analysis, models for processing medical images have greatly benefitted from pre-training on natural image datasets like ImageNet. They have also been shown to perform better if pre-trained on other medical imaging data which adapts them to better leverage the intrinsic structure of medical imaging. It is a non-trivial task to figure out how to best use expert knowledge as a prior for the model, and if its effect is going to be positive or negative on the model.
The data from CT scans is very high dimensional. Generally each series contains around 70-300 high-resolution images, which leads to a really large input dimensionality. However, the information which is useful for discriminating abnormal from normal trials can be minuscule in comparison and sometimes is present only in a small subset of the data. One way to tackle this is by using Gaussian Mixture Models to pinpoint viable regions, however this is still an open and very important problem.

The Future of Deep Learning with CT scans

In the last few years, novel architectures like U-Net, V-Net, Cascaded fully convolutional networks etc have shown to perform much better on segmentation and classification tasks when working with CT scan images. There is immense opportunity for new models to be developed to tackle some of the challenges of training with CT data. In addition, various techniques that have been used in other fields to process similar formats of data can be used to build models which perform better on medical imaging datasets:

VoxNet [5]: This work goes in depth on using 3D convolutional networks for deep representations of 3D volumetric point cloud input data for object recognition and classification tasks. This provides a good baseline for working on 3D data
OctNet [6]: It hierarchically and dynamically partitions the input into sections of different sizes, based on the amount of detail that they contain. This can potentially pay more attention to intricate sections containing more details, which could be helpful in some tasks like detecting nodules and lesions.

Static resolution of the input using a DenseConvNet vs using the OctNet architecture.

Conclusion

Since 2016, a lot of papers have come up using publicly available datasets consisting of CT scans of the abdomen, chest, and brain. However, a majority of them have been working with CT scans using individual image slices with 2D CNNs. Working with the entire 3D series is an area which is being intensely explored right now. There is potential for applying various other model architectures to tackle these datasets in their full form, and also using machine learning for other diagnostic and detection tasks. As we develop better models, get access to more data, and increase our computational capacities, it will be exciting to see how these tools can aid radiologists and medical practitioners in providing better diagnoses.

I am extremely grateful to Matthew Lungren MD MPH, Assistant Professor of Radiology and Bhavik Patel, MD, MBA, Assistant Professor of Radiology at the Stanford University Medical Center at the Stanford University Medical Center for their guidance and valuable feedback. I would also like to thank Pranav Rajpurkar, Jeremy Irvin, Shubhang Desai, and Nicholas Bien of the Stanford Machine Learning Group for their comments.