Teamwork: Radiologists and Deep Learning

Adam Mehdi
The Startup
Published in
3 min readJul 21, 2020

There was a recent paper in Nature that detailed the prospects of radiologist cooperation with deep learning algorithms to analyze chest X-rays. The results and their implications are particularly interesting, and this is the topic I will expound upon in this article, providing additional interpretation and speculation.

The Experiment and Results

In the experiment, a deep learning model, individual radiologists, and a group of radiologists connected with “swarm technology” were tasked to diagnose pneumonia from chest X-rays. Swarm technology sounds fancy, but it is essentially just a real-time collaborative software. It is designed to help a group of radiologists cooperate more effectively than by performing group votes for diagnoses, which is a method prone to unfavorable social influence.

The results were that the deep learning model (called CheXMax) performed the best on some metrics, while the swarm radiologists outperformed the model on other metrics. So, which one is better? The two perform best on different aspects of the task.

Indeed, the model had better sensitivity (more precise at detecting pneumonia if it does exist), and the swarm radiologists exhibited higher specificity (more precise when pneumonia does not exist). It follows from the model’s sensitivity that CheXMax tends to predict more low-confidence positives when the label is in fact negative. On the other hand, the swarm radiologists will be better at distinguishing those low-confidence positives since they more often err on high-confidence negatives.

So the researchers performed another test. The model was used to initially diagnose all patients, then the swarm radiologists re-diagnosed the low-confidence positives. Remarkably, the swarm correctly changed the prediction label 10/11 times.

The Implications

And hence we have arrived at the optimal pneumonia-diagnosing apparatus: a deep learning model augmented by a group of radiologists for its low-confidence positive predictions.

To be frank, the complementarity of the deep learning model and human diagnoses is probably a happy accident for the jobs of a great deal of radiologists as they are now. Not only does a single radiologist retain their use in a human-augmented deep learning system, but a group is also necessary to produce optimal predictions.

Although the deep learning model will cut the total number of cases that radiologists must review by presenting only low-confidence positives, each case would require more radiologists if swarm technology it to be utilized. The authors suggest that for understaffed hospitals and clinics, a single radiologist augmenting the deep learning model is still better than none, but demand for more radiologists would nevertheless remain.

That is the current state of affairs, but I have reason to believe that it may change. The authors mention that the data from the low-confidence positive corrections will be fed back into the model. Thus, with a continually expanding high-quality dataset, it is reasonable to suspect that the deep learning model will surpass the performance of the swarm radiologists in terms of both sensitivity and specificity in due time, without need for any architectural innovations or novel techniques.

In this light, the human augmentation of deep learning models do not necessarily highlight weaknesses inherent within the models, but rather serve as a tool in compensating for our current shortage of medical data. As we continue to collect more data as we exercise the model in real-life trials, this condition of data shortage is apt to change, and so will the optimality of teamwork for diagnoses.

--

--

Adam Mehdi
The Startup

Thinking about AI & epistemology. Researching CV & ML as published Assistant Researcher. Studying CS @ Columbia Engineering.