Artificial intelligence for breast cancer detection

5 min readOct 11, 2021

Abstract

In 2020, there were 2.3 million women diagnosed with breast cancer and 685 000 deaths globally. Screening for breast cancer with mammography has been introduced in various countries over the last 30 years.

In the last couple of years, Artificial Intelligence(AI) based commercial products have been developed, and studies of their performance compared to that of experienced breast radiologists are showing that these algorithms are on par with human-performance levels in retrospective data sets.

A, Digital mammography (DM) and, B, digital breast tomosynthesis (DBT) images in a 66-year-old woman not recalled during any of the original readings. Artificial intelligence (AI) identified a spiculated mass (outlined) on images obtained with both techniques during screening. C, DM image obtained 4 months later, after she discovered a palpable lump (not related to the actual cancer). Biopsy was performed, and an interval cancer, a grade II invasive ductal carcinoma of 6 mm, was diagnosed in the lesion that would have been recalled by AI.

Screening and Diagnosis

Digital mammography (DM) results in an immediate digital image, ready for evaluation and interpretation by the radiologist. However, mammography-based screening programs are subject to some limitations. First, mammography sensitivity is lower with higher breast density. This leads to up to 20%–30% of breast cancers not being detected during screening and later manifesting symptomatically as interval cancers. Second, it is estimated that at least one out of three women participating in screening will have a false-positive recall during her lifetime, which not only adds harm to women but also increases the cost and workload of health care systems.

When radiologists are interpreting screening mammograms, they are searching for lesions with very different characteristics that can be divided into two broad categories: calcification clusters and soft tissue findings. The calcifications of interest for the detection of breast cancer are small (as little as 0.2 mm) and relatively high in contrast. The shape of the calcifications and the distribution of the cluster of calcifications being important biomarkers for malignancy. Soft tissue lesions are of different types; masses (with different shape and margin descriptors, such as spiculated, smooth, obscured, irregular), architectural distortions (abnormal configuration of the fibroglandular tissue), and asymmetries (dense tissue patterns in one breast with no correspondence on the contralateral breast).

To overcome limitations in DM due to its two-dimensional nature, digital breast tomosynthesis (DBT) was introduced which results in a stack of 2D slices of the imaged breast with vertical resolution. This partial tomographic effect reduces the masking effect of superimposed tissues. Digital breast tomosynthesis (DBT) has been shown to improve breast cancer screening detection rates by 30%–90% compared with digital mammography (DM), with a diverse impact on recall rate.

Diagram of a (a) digital mammogram and a (b) digital breast tomosynthesis acquisition. Tissues in the breast that are only separated in the vertical direction appear superimposed in the mammogram, resulting in a loss of sensitivity and specificity. This effect is ameliorated in digital breast tomosynthesis by reconstructing a pseudo-3D image from several projections, each acquired with the x-ray source positioned at a different angle.

Deep learning convolutional neural networks

Deep learning convolutional neural networks(CNN) involve the processing of an image by multiple, sequential, stages, denoted layers, of usually simple multiplication, addition, and convolutions operators, that combine the spatially correlated information contained in images. During this multiple-stage process, this information is broken down into different representations, and the analysis of these more abstract, and simpler, representations of this information results in the ability of the network to recognize the image accurately.

Case Study

The data for this study were retrospectively collected from the Córdoba Tomosynthesis Screening Trial. A total of 15 987 DM and DBT examinations (which included 98 screening-detected and 15 interval cancers) from 15 986 women (mean age ± standard deviation, 58 years ± 6) were evaluated. The DM and DBT images were independently read by four breast radiologists.

Flowchart of the original screening strategies and how they were compared with the artificial intelligence (AI)–based screening strategy. If the original setting used digital mammography (DM), AI scores computed on DM images were used. Similarly, if the original setting used digital breast tomosynthesis (DBT), the AI scores computed on only DBT images were used. Cases were considered very likely normal if the AI score was 7 or lower (approximately 70% of screening volume).

The AI system with deep learning algorithm was used in this study to detect lesions suspicious for breast cancer on DM and DBT images. The most suspicious findings detected by the system are marked on every image and assigned a score between 1 and 100. Based on the maximum suspicious examination score from 1 to 10 was generated, indicating the increasing likelihood that a visible cancer is present on the mammogram.The DBT images and the DM images of each examination were independently processed by the AI system, resulting in two AI scores per examination: an AI-DM score and an AI-DBT score.

The output of the AI triaging was analyzed by a panel of radiologists (with 3 to 20 years of experience), and findings were considered true-positive only if the system correctly localized them and assigned them the highest suspicion score at the examination (on the region suspicion scale of 1–100).

The screening reading workload, sensitivity, and recall rate were compared between each original screening setting and the AI-based screening strategy by using the McNemar test for paired data, with an α of .05 indicating statistical significance. Screening workload was defined as the number of readings, and an estimate in hours was computed using the average reading time per examination, 25 seconds for a DM examination and 64 seconds for a DBT plus DM or synthetic mammography examination.

When comparing the AI-based strategy of DBT to the original double reading of DM, it was observed that AI-based DBT screening would have been carried out with a smaller workload (156 hours vs 222 hours, a relative workload reduction of 29.7% [95% CI: 23.8, 36.2], P < .001). The sensitivity would have been 25.0% higher in relative terms (95% CI: 15.8, 36.3; P < .001), with 95 of 113 cancers detected with AI-DBT screening (84.1%; 95% CI: 76.2, 89.7) and 76 of 113 with unaided DM screening (67.3%; 95% CI: 58.2, 75.2). Moreover, the recall rate would have been 27.1% lower in relative terms (95% CI: 24.1, 30.3; P < .001), with 588 of 15 987 women recalled with AI-DBT screening (3.7%; 95% CI: 3.4, 4.0) and 807 of 15 987 women recalled with unaided DM screening (5.1%; 95% CI: 4.7, 5.4).

Result

In a retrospective study of 15 987 mammograms, artificial intelligence (AI) reduced screening workload up to 70% for both digital mammography (DM)– or digital breast tomosynthesis (DBT)–based screening programs without reducing sensitivity by 5% or more.

Using AI to transition from DM screening to DBT screening would yield a reduction of 30% in workload, a 25% improvement in sensitivity, and a reduction of 27% in recall rate.

Conclusion

Digital mammography and digital breast tomosynthesis screening strategies based on artificial intelligence systems could reduce workload up to 70%.