Improving breast cancer screenings through machine learning

Profs. Kyunghyun Cho and Krzysztof J. Geras tackle inconclusive scans and false positives on their latest paper

Although deep convolutional neural networks can perform image object recognition in natural images, this technology still remains inapplicable for the medical industry. Medical professionals require high resolution images to make fine details visible, and the ability to view the human body from multiple angles to draw up a complete diagnosis for their patients.

Enter the multi-view deep convolutional neural network (MV-DCN), created by the Center’s Professors Kyunghyun Cho and Krzysztof J. Geras, and a research team from NYU’s School of Medicine (Stacey Wolfson, S. Gene Kim, and Linda Moy). They published their findings this past March in “High-Resolution Breast Cancer Screening with Multi-View Deep Convolutional Neural Networks.”

Breast cancer screening images served as the focus, the team explained, since breast cancer is the second leading cause of death among women in the United States. Moreover, mammograms, which are the main imaging test to detect breast cancer, lack precision. Between 10–15% of women who are screened are called back for further screenings or biopsies, most of which result in false positives, creating anxiety and unnecessary costs.

MV-DCN, however, promises to improve the breast cancer screening process through its innovative architecture, which can handle the four standard views, or angles, without sacrificing a high resolution. As opposed to the commonly used DCN architectures for natural images, which work with images of 224 x 224 pixels, the MV-DCN is also capable of using a resolution of 2600 x 2000 pixels.

Additionally, MV-DCN produces predictions for medical professionals by assigning different probabilities to three possible outcomes (incomplete, normal, and benign), and also indicates which part of the breast needs to be examined further, if at all. When training MV-DCN, however, the team also faced some stumbling blocks. Training neural networks typically depends on millions of annotated images. Although the researchers collected the largest data set of this kind in the literature, medical images are much harder to acquire in large quantities, making their data set of 103,000 mammogram scans (collected from the NYU School of Medicine) small in comparison to data sets of natural images.

Deep neural networks also require a lot of computation to get from the input, which in this case is the four standard views in a breast cancer screening exam, to the output, the probability distribution over possible decisions. With larger images like the ones being used in this network, the computational issue is amplified. The speed of training the networks is limited by their hardware’s memory as the images being used are so large. In conclusion, the researchers found that performance increases with a larger training set and the original resolution is necessary to achieving the best performance. Currently, they are hoping to improve MV-DCN’s capabilities by collecting an even larger data set.

by Nayla Al-Mamlouk


Originally published at cds.nyu.edu on April 10, 2017.

Center for Data Science

This is the official research blog of the NYU Center for Data Science (CDS). Established in 2013, we are a leading data science training and research facility, offering a MS in Data Science and, as of 2017, one of the nation’s first universities to offer a Ph.D. in Data Science.

NYU Center for Data Science

Written by

Official account of the Center for Data Science at NYU, home of the Master’s and Ph.D. in Data Science.

Center for Data Science

This is the official research blog of the NYU Center for Data Science (CDS). Established in 2013, we are a leading data science training and research facility, offering a MS in Data Science and, as of 2017, one of the nation’s first universities to offer a Ph.D. in Data Science.