Cognitive biases and augmented intelligence in radiology

Published in

Lunit Team Blog

12 min readAug 2, 2020

How cognitive biases affect diagnostic problems in radiology and how AI can help out

Cognitive biases

“An apple a day, keeps the doctor away”: an aphorism that is often told to kids to entice them to eat more fruits. Although it may have an element of truth in it, it also serves as a nice example of the rhyme-as-reason effect, a cognitive bias that describes the tendency of people to perceive statements as more likely to be true, when put in the form of a rhyme.

Biases in human reasoning affect all aspects of our lives and have been studied extensively. Daniel Kahneman, an Israeli psychologist was one of the first and most prominent in this field and received a Nobel prize in economics in 2002, for his work on cognitive biases and their effect on people’s buying behavior.

Apart from economics, cognitive biases are also important in medicine. For example, humans consistently overestimate the likelihood of rare diseases (referred to as a zebra in American medical slang), causing overdiagnosis, which results in unnecessary stress for patients and economic burden for clinics.

Shortcomings in human cognitive capacities can have even more serious consequences than overdiagnosis. Human error in medicine has recently been identified as the third leading cause of death in the US. In radiology, errors are not only affected by faults in reasoning, but also in perception [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13].

Some important perceptual biases in radiology are

Satisfaction of search bias This bias can cause errors when the clinician stops their search for pathologies in an image, once an initial abnormality has been identified. A potential result is more dangerous diseases being overlooked or a differential diagnosis failing, because of a lack of information.
Confirmation bias Related to the satisfaction of search bias. In this case however, the clinician has some predisposed ideas about the disease that should be present and only looks for evidence to confirm his or her hypothesis.
Prevalence effect The prevalence effect describes the fact that people are far more likely to miss rare events than common events, all other things being equal. An example of this is in airport security: bags containing weapons are luckily very rare, but officers tend to miss them if they do. However, when the same signal is presented in a setting where it occurs more frequently, they have no problem detecting it. This bias has also been identified as a potential cause of human error in screening settings for relatively rare diseases [13].
Inattentional blindness This phenomenon describes the failure to notice a completely obvious sign, that is very different from what the person is trained or instructed to look for. A famous example of this bias at work is a study where researchers inserted a picture of a gorilla in a slice of a chest CT scan; 83% of radiologists looking for lung nodules in the scan failed to see the obvious gorilla [12].

Biases causing misinterpretation in radiology (and medical diagnosis in general) are

Anchoring bias Related to the satisfaction of search bias, except that the clinician keeps looking for new information. However, they lock into an initial diagnosis in spite of this new evidence.
Automation bias The automation bias describes the behavior of people who rely too much on technology. For instance, if readers of radiological exams know a system in the background is helping them, they may become lazy and fully trust the system.
Zebra retreat Similar to the prevalence effect, this applies to the detection of rare diseases. In the case of the zebra retreat, the clinician did notice the disease but assumed that it must be normal, because the particular disease is so rare. (This is the opposite of the zebra effect, the tendency to overestimate the probability of a rare disease.)

To prevent these biases, procedures such as checklists, decision trees and standardized reporting systems have been proposed. Computers, in the form of — computer aided diagnosis (CAD) — can also mitigate these effects. What better way to aid someone then to compensate for their mistakes? This post discusses some common setups in which radiologists act with AI systems and postulates how errors resulting from perceptual and cognitive biases could be mitigated.

Augmented intelligence

Baring a few exceptions, it will likely be some time before AI systems are allowed to read medical images completely autonomously. Until then, systems should complement radiologists and compensate for mistakes humans make. The final performance is a function of the radiologist, the system and the interface between them. This paradigm is often referred to as ‘augmented intelligence’ or ‘complimentary intelligence’ and was eloquently phrased by Gilbert et al. [14]:

“[…] Instead, we must focus on promoting the model of the “centaur”, a highly trained human working together with an AI to achieve more than would be possible alone.. “

At the moment there are roughly three different setups where the computer interacts with a radiologist. In common parlance, all of these are referred to as ‘computer aided diagnosis’ (CAD), although this term is also often used as a subtype of general CAD. All setups require different ‘levels’ of automation. Similar to automation levels defined for self-driving cars [15], one could generate a (somewhat hand-wavy) hierarchy, with the doctor on the one end and an autonomous AI system on the other. A depiction is given in figure 1.

Figure 1. Different setups in which computers are used in radiology applications and their respective level of automation. In the ‘computer aided detection (CADe)’ setup, an algorithm is used to pre-select areas in a case that look suspicious, the radiologist does the diagnosis. In the ‘computer aided diagnosis (CADx)’ setup, an algorithm helps with assessment of suspicious areas. In a ‘computer assisted simple triaging (CAST)’ setting, an algorithm pre-selects only important whole cases for radiologists to read. In the final setting, the algorithm performs the task completely autonomously. (image by author)

Computer aided detection (also referred to as CADe)

In this setting, the radiologist opens an image and queries the system which subsequently shows markers or heat maps on suspicious areas. The US Food and Drug Administration (FDA) describes CADe systems as [16]:

“CADe devices are computerized systems that incorporate pattern recognition and data analysis capabilities (i.e., combine values, measurements, or features extracted from the patient radiological data) and are intended to identify, mark, highlight, or in any other manner direct attention to portions of an image, or aspects of radiology device data, that may reveal abnormalities during interpretation of patient radiology images or patient radiology device data by the intended use […]”

Although promising, it was shown that that early systems were mostly effective for catching errors in search, in particular small pathologies such as calcifications in mammograms, but were simply distracting for things that would have been found anyway. Because of poor independent performance, the system turned out to be ineffective in the clinic.

Figure 2. Example of computer aided detection systems. An input image, a mammogram in this case (left), is fed through a machine learning algorithm that adds markers (middle) or generates a heatmap (right) on suspicious areas. (image by author)

The reason for this lack of ‘augmented intelligence’ has been attributed to the automation bias, which describes the tendencies of humans to trust automated systems too much. This bias can be a blessing and a curse. Any system operating below human performance will drag the radiologist down, any system operating above it will make it better.

2. Computer aided diagnosis (also referred to as CADx)

CADx systems do not (only) mark suspicious areas in the image. Instead the system provides some more information that is relevant for the diagnosis, such as a score for the image or part of the image. The FDA describes CADx as [14]:

“CADx devices are computerized systems intended to provide information beyond identifying, marking, highlighting, or in any other manner directing attention to portions of an image, or aspects of radiology device data, that may reveal abnormalities during interpretation of patient radiology images or patient radiology device data by the clinician.”

Figure 3. Examples of computer aided diagnosis (CADx) systems. An input image, in this case a mammogram (left) is fed through an ML algorithm. A user can query a region in the image to get a malignancy score for that particular region (middle) or simply get a score for a whole image (right), along with potentially a heatmap where the malignancy is (image by author).

Some examples of CADx systems are:

A. Interactive decision support In this setting the radiologist queries a region in the image and the system shows a score that represents the degree of suspicion of the region [17].

B. Content based image retrieval Content based image retrieval (CBIR) was first introduced in search engines to help people find similar content. In a medical context, the use queries an area in the image and the system shows a set of similar patches, for instance five positive and five negative cases that all look similar.

Although the idea has been discussed extensively [18, 19], to date little (or none?) clinical applications exist. It is also difficult to say if displaying similar images with their classes actually improves the reader’s performance over something like simple decision support.

Figure 4. Illustration of the use of a content based image retrieval (CBIR) system for computer aided diagnosis. A user could query the image and the system will look for similar regions along with their respective diagnosis.

CADx systems will mostly target errors in interpretation (such as the anchoring bias and zebra retreat), search errors will largely remain unaffected because the radiologist still has to search for suspicious areas. The automation bias may be less likely to gain a foothold, because the user typically has to query the system first.

3. Computer assisted triaging

Triaging systems rank patients based on urgency, by estimating an outcome such as their condition or probability of recovery. The idea of using computers to do triaging is sometimes referred to as computer assisted simple triage (CAST). This was initially proposed for emergency room settings [20], but recently caught on in other domains [21, 22]. The FDA describes computer aided triaging systems as:

“Computer-triage devices are computerized systems intended to, in any way, reduce or eliminate any aspect of clinical care currently provided by a clinician, such as a device for which the output indicates that a subset of patients (i.e., one or more patients in the target population) are normal and therefore do not require interpretation of their radiological data by a clinician.”

At the moment there are roughly two different settings:

A. Soft triaging Here, all cases are ordered and presented to a doctor in this order. This allows the clinician to focus on the most pressing cases first. Figure 5 shows an illustration of such a system.

Figure 5. Illustration of a ‘soft triaging’ system. The algorithm ranks cases based on some measure of urgency (for instance how likely it is the case contains a disease) and sends them in the form of an ordered list to the practicing physician.

In a soft triaging setting, essentially all the cognitive biases that are present -on a case level- still apply. However, radiologists may be less likely to miss essential information due to fatigue, since they can schedule the urgent cases at times when they feel most rested, provided the diagnosis task allows them to.

If implemented well, the algorithms are expected to generate a better triage than humans and therefore on a case-list level, biases stemming from misinterpretation are expected to be mitigated.

B. Hard triaging (or rule-out systems) Similar to the soft triaging approach, the cases are ordered but instead, the bottom x% is no longer presented to a doctor, to free up time. This is particularly useful in low incidence settings such as screening, where large proportions of cases could be diagnosed automatically. Figure 6 displays an illustration of this setup.

Figure 6. Illustration of a ‘hard triaging’ system. Similar to a soft triaging system, an algorithm is used to rank cases. Here, however, a threshold is set on the measure of urgency and only the most urgent/suspicious cases are sent to the radiologist, the rest are filled out automatically.

Again similar to soft triaging, cognitive biases that apply on a case level are still present in all the cases that are shown to the radiologists. You could argue that the automation effect is eliminated for cases not shown to radiologists, or that it is simply the extreme case of the automation effect: all cases follow the diagnosis of the system. An advantage, however, is that it is easier to evaluate the system for this subset, because user interactions do not have to be taken into account.

Errors suffered because of fatigue are likely to be mitigated, as a lot of time is freed up (unless this is again allocated to different tasks). The prevalence effect, the phenomenon where readers are more likely to miss signals in a low incidence setting is also likely to be mitigated, as the remaining cases will have a higher incidence.

Autonomous AI

In 2018, the FDA gave approval for the first ever autonomous AI system, a system used in screening for diabetic retinopathy [23]. For most applications in medical image analysis, it may be some time before a similar system will be realized, as development takes years and regulations should be strict [24]. Some intermediate steps could still be applied though.

In some screening settings such as lung and breast cancer screening, exams are sometimes read by two radiologists, typically independently. One of the two radiologists could be replaced by a system operating autonomously. In this case the ‘augmented intelligence’ component is still there: concepts from ensemble learning apply and the system is somewhat simpler to analyze. An illustration of this setup is provided in figure 7.

Figure 7. In some radiological settings (such as screening for breast/lung cancer) images are read by two readers, typically independently. An intermediate step towards autonomous AI would be to first replace only one of the two readers. In this case, the system still has to ‘augment’ the radiologists and compensate for mistakes they make (image by author).

In case of a completely autonomous AI system, all cognitive biases are alleviated. That does not mean the system is unbiased though. If it was trained with bias in the data, you have a different problem. Biases in the data such as the center the data was trained on, the manufacturer of the scanner and the annotator will still be reflected in the output.

To summarize, computers are powerful tools and have great potential for medical diagnosis. Until systems can operate independently, they should help radiologists and compensate for errors radiologists make. Carefully analyzing errors of radiologists, for instance, by looking at cognitive biases, could help boost the joint performance of the radiologist and the system.

References

Kundel, H.L., Nodine, C.F. and Carmody, D., 1978. Visual scanning, pattern recognition and decision-making in pulmonary nodule detection. Investigative radiology, 13(3), pp.175–181.
Pinto, A. and Brunese, L., 2010. Spectrum of diagnostic errors in radiology. World journal of radiology, 2(10), p.377.
Kim, Y.W. and Mansfield, L.T., 2014. Fool me twice: delayed diagnoses in radiology with emphasis on perpetuated errors. American journal of roentgenology, 202(3), pp.465–470.
Bruno, M.A., Walker, E.A. and Abujudeh, H.H., 2015. Understanding and confronting our mistakes: the epidemiology of error in radiology and strategies for error reduction. Radiographics, 35(6), pp.1668–1676.
Berbaum, K.S., FRANKEN Jr, E.A., DORFMAN, D.D., ROOHOLAMINI, S.A., KATHOL, M.H., BARLOON, T.J., BEHLKE, F.M., Sato, Y.U.T.A.K.A., LU, C.H., EL-KHOURY, G.Y. and FLICKINGER, F.W., 1990. Satisfaction of search in diagnostic radiology. Investigative radiology, 25(2), pp.133–140.
Akgül, C.B., Rubin, D.L., Napel, S., Beaulieu, C.F., Greenspan, H. and Acar, B., 2011. Content-based image retrieval in radiology: current status and future directions. Journal of digital imaging, 24(2), pp.208–222.
Graber, M., 2005. Diagnostic errors in medicine: a case of neglect. The Joint Commission Journal on Quality and Patient Safety, 31(2), pp.106–113.
Busby, L.P., Courtier, J.L. and Glastonbury, C.M., 2018. Bias in radiology: the how and why of misses and misinterpretations. Radiographics, 38(1), pp.236–247.
Bornstein, B.H. and Emler, A.C., 2001. Rationality in medical decision making: a review of the literature on doctors’ decision‐making biases. Journal of evaluation in clinical practice, 7(2), pp.97–107.
Saposnik, G., Redelmeier, D., Ruff, C.C. and Tobler, P.N., 2016. Cognitive biases associated with medical decisions: a systematic review. BMC medical informatics and decision making, 16(1), p.138.
https://radiopaedia.org/articles/cognitive-bias-in-diagnostic-radiology
Drew, T., Võ, M.L.H. and Wolfe, J.M., 2013. The invisible gorilla strikes again: Sustained inattentional blindness in expert observers. Psychological science, 24(9), pp.1848–1853.
Evans, K.K., Birdwell, R.L. and Wolfe, J.M., 2013. If you don’t find it often, you often don’t find it: why some cancers are missed in breast cancer screening. PloS one, 8(5).
Gilbert, F.J., Smye, S.W. and Schönlieb, C.B., 2020. Artificial intelligence in clinical imaging: a health system approach. Clinical radiology, 75(1), pp.3–6.
SAE On-Road Automated Vehicle Standards Committee, 2018. Taxonomy and definitions for terms related to driving automation systems for on-road motor vehicles. SAE International: Warrendale, PA, USA.
FDA, U. (2012). Guidance for Industry and Food and Drug Administration Staff: Computer-Assisted Detection Devices Applied to Radiology Images and Radiology Device Data — Premarket Notification [510 (k)] Submissions.
Hupse, R., Samulski, M., Lobbes, M.B., Mann, R.M., Mus, R., den Heeten, G.J., Beijerinck, D., Pijnappel, R.M., Boetes, C. and Karssemeijer, N., 2013. Computer-aided detection of masses at mammography: interactive decision support versus prompts. Radiology, 266(1), pp.123–129.
Cai, C.J., Reif, E., Hegde, N., Hipp, J., Kim, B., Smilkov, D., Wattenberg, M., Viegas, F., Corrado, G.S., Stumpe, M.C. and Terry, M., 2019, May. Human-centered tools for coping with imperfect algorithms during medical decision-making. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (pp. 1–14).
Akgül, C.B., Rubin, D.L., Napel, S., Beaulieu, C.F., Greenspan, H. and Acar, B., 2011. Content-based image retrieval in radiology: current status and future directions. Journal of digital imaging, 24(2), pp.208–222.
Goldenberg, R. and Peled, N., 2011. Computer-aided simple triage. International journal of computer assisted radiology and surgery, 6(5), p.705.
Yala, A., Schuster, T., Miles, R., Barzilay, R. and Lehman, C., 2019. A deep learning model to triage screening mammograms: a simulation study. Radiology, 293(1), pp.38–46.
Annarumma, M., Withey, S.J., Bakewell, R.J., Pesce, E., Goh, V. and Montana, G., 2019. Automated triaging of adult chest radiographs with deep artificial neural networks. Radiology, 291(1), pp.196–202.
Abràmoff, M.D., Lavin, P.T., Birch, M., Shah, N. and Folk, J.C., 2018. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ digital medicine, 1(1), pp.1–8.
(Docket No. FDA‐2019‐N‐5592) “Public Workshop ‐ Evolving Role of Artificial Intelligence in Radiological Imaging;” Comments of the American College of Radiology, 2020

Cognitive biases and augmented intelligence in radiology

Cognitive biases

Augmented intelligence

Autonomous AI

References

Written by Thijs Kooi