Google DeepMind -Improving the interpretability of “Black Box” in medical imaging

Susan Ruyu Qi
Health.AI
Published in
6 min readAug 23, 2018
Segmentation Video of an OCT Scan/ taken from DeepMind’s original publication

The key barrier for AI in healthcare is the “Black Box” problem. For most AI systems, the model is hard to interpret and it is difficult to understand why they make a certain diagnosis or recommendation. This is a huge issue in medicine, for both physicians and patients.

Deep Mind’s study published last week in Nature Medicine, presenting their Artificial Intelligence (AI) product capable of diagnosing many ophthalmic conditions from 3D retinal OCT scans. Its performance is on par with the best retinal specialists and superior to some human experts.

This AI product’s accuracy and range of diagnoses are certainly impressive. It is also the first AI model to reach expert-level performance with 3D diagnostic scans. From a clinical point of view, however, what is even more groundbreaking is the ingenious way in which this AI system operates and mimics the real-life clinical decision process. It addresses the “Black Box” issue which has been one of the biggest barriers to the integration of AI technologies in healthcare.

An optical coherence tomography (OCT) scan of the retina

Two Neural Networks

DeepMind’s AI system improved the interpretability of the “Black Box” by creating a framework with two separate neural networks. Instead of training one single neural network to identify pathologies from medical images, which would require a lot of labelled data per pathology, their framework decouples the process into two: 1) Segmentation (identify structures on the images) 2) Classification (analyze the segmentation and come up with diagnoses and referral suggestions)

DeepMind’s framework tackles the “Black Box” Problem by having 2 neural networks with a readily viewable intermediate representation (tissue map) in between

1. The Segmentation Network

Using a three-dimensional U-Net architecture, this first neural network translates raw OCT scans into tissue maps. It was trained using 877 clinical OCT scans. For each scan’s 128 slices, only about 3 representative ones were manually segmented. This sparse annotation procedure significantly reduced workload and allowed them to cover a large variety of scans and pathologies. The tissue maps identify the shown anatomy (ten layers of the retina) and label disease features (intra-retinal fluid, hemorrhage) and artifacts.

This process mimics the typical clinical decision process. It allows physicians to inspect the AI’s segmentation and gain insight into the neural network’s “reasoning”. This intermediate representation is key to the future integration of AI into clinical practice. It is particularly useful in difficult and ambiguous cases. Physicians can inspect and visualize the automated segmentation rather than simply being presented with a diagnosis and referral suggestion.

This segmentation technology also has enormous potential in clinical training as it can help professionals to learn to read medical images.

Furthermore, it can be used to quantify and measure retinal pathologies. Currently, retinal experts can only eyeball the differences between present and past OCT scans to objectify disease progression (eg: more intra-retinal fluids). With the AI’s automated segmentation, however, quantitative information such as the location and volume of seen anomalies can be automatically derived. This data can then be used for disease tracking and research, as an endpoint in clinical trials for example.

left: a raw OCT scan; middle: manual segmentation; right; automated segmentation

2. The Classification Network

This second neural network analyses the tissue-segmentation maps and outputs both a diagnosis and a referral suggestion. It was trained using 14884 OCT scan volumes from 7621 patients. Segmentation maps were automatically generated for all scans. Clinical labels were obtained by examining the patient’s clinical records in order to determine retrospectively 1) the final diagnosis (after all investigations), 2) the optimal referral pathway (in light of that diagnosis).

The classification network, therefore, takes segmentation maps and learns to prioritize patients’ need for treatment into urgent, semi-urgent, routine and observation-only. It then outputs a diagnosis in form of a probability of multiple, concomitant retinal pathologies.

Output: predicted diagnosis probabilities and referral suggestions

Image Ambiguity and Ensembling

Image interpretation and segmentation can be difficult for humans and machines alike due to the presence of ambiguous regions, where the true tissue type cannot be deduced from the image, and thus multiple equally plausible interpretations exist. To overcome this challenge, DeepMind’s framework uses an ensemble of 5 segmentation instances instead of 1. Each network instance creates a full segmentation map for the given scan, resulting in 5 different hypotheses. These different maps, just like different clinical experts, agree in areas with clear image structures but may differ in ambiguous low-quality regions. Using this ensemble, the ambiguities arising from the raw OCT scans are presented to the subsequent decision (classification) network. The classification network also has an ensemble of 5 instances which are applied to each of the 5 segmentation maps, resulting in a total of 25 classification outputs for every scan.

Results:

The framework achieved an area under the ROC curve that was over 99% for most of the pathologies, on par with clinical experts. As for the referral suggestion, its performance matched that of the five best specialists and outperformed that of the other three.

Future:

OCT is now one of the most common imaging procedures with 5.35 million OCT scans performed in the US Medicare population in 2014 alone.

The widespread availability of OCT has not been matched by the availability of expert humans to interpret scans and refer patients to the appropriate clinical care.

DeepMind’s AI solution has the potential to lower the cost and increase the availability of screening for retinal pathologies using OCT. Not only can it automatically detect the features of eye diseases, but it also prioritizes patients most in need of urgent care by recommending whether they should be referred for treatment. This instant triaging process should drastically cut down the delay between the scan and treatment, allowing patients with serious diseases to obtain sight-saving treatments in time.

“Anytime you talk about machine learning in medicine, the knee-jerk reaction is to worry that doctors are being replaced. But this is not going to replace doctors. In fact it’s going to increase the flow of patients with real disease who need real treatments,” Dr. Ehsan Rahimy, MD, a Google Brain consultant and vitreoretinal subspecial­ist in practice at the Palo Alto Medical Foundation.

*Addendum April 11, 2021: this article has been renamed from its original title “Google DeepMind might have just solved the “Black Box” problem in medical AI.” Like some readers pointed out, the segmentation algorithm remains a black box itself and doesn’t truly solve the problem. However, from a clinical point of view, the segmented views of the 3D OCT scans are extremely valuable. It is very similar to how physicians make diagnoses from imaging: 1) identify anomalies on the scan 2) associate anomalies with a pathology. Therefore, adding an intermediate representation to the diagnostic algorithm significantly improves the AI system’s interpretability, making it more feasible for clinical deployment in the future.

Read more from Health.AI:

AI 2.0 in Ophthalmology — Google’s Second Publication

Deep Learning in Ophthalmology — How Google Did It

Machine Learning and OCT Images — the Future of Ophthalmology

Machine Learning and Plastic Surgery

AI & Neural Network for Pediatric Cataract

The Cutting Edge: The Future of Surgery

--

--

Susan Ruyu Qi
Health.AI

MD, Ophthalmology Resident| clinical AI, Innovations in ophthalmology and vision sciences