The human-AI relationship in medicine — why should we care?

By Lana Tikhomirov

Since the advent of cognitive psychology research and its application to a diverse range of settings, a few key truths have emerged: humans behave in unanticipated ways in a range of situations.

For example, the type of police line-up eyewitnesses to a crime receive may determine if they identify the right perpetrator. Some witnesses may even recall memories of events that never happened. And most relevant, give a human a simple automated decision-aid, such as an aptitude test utilising pre-programmed criteria to determine the best candidate for a job, or a complicated deep-learning algorithm, and sometimes you see a range of less-than-ideal scenarios, including automation bias, poorer use of decision-aids when tasks become more difficult, or an erosion of skills needed to monitor the technology itself.

Even if ‘performance’ using an algorithm (however you’d define that, more on that later) isn’t necessarily worse, there is a chance it could be different. That deviation from the norm can create downstream effects. These effects can return to change the very problem the algorithm has set out to solve. See concept drift.

The importance of ‘human factors’

Human factors, the general field that investigates that grey area between technology and its use by humans, has been a somewhat quiet figure in the public sphere. Most people that are familiar with the field have become so after they’ve encountered unusual behaviour between humans and machines. After is the key word here, because often it takes a hard whacking to begin to recognise the importance of human factors. But what if we started recognising that people behave in sometimes completely unanticipated ways before the dreaded ‘after’?

Don’t get me wrong, after having researched human factors and cognitive psychology in depth for almost 4 years, predicting human behaviour is difficult, and evaluating human behaviour is difficult. Research in this area, therefore, takes time and fairly rigorous methods from the cognitive sciences. Not all methods from the field have stood the test of time but add insight from cognitive psychology and you’ll find some ways are more robust than others.

The general public still believe many outdated and less than helpful theories of human behaviour (*coughs* nudging) and the history of the replication crisis in psychology certainly hasn’t helped (where poor statistical methods and study designs led to a slew of irreplicable research results) but it has taught us a lot.

Cognitive psychology, which uses many interdisciplinary techniques, exists to investigate and produce methodology to study human thinking and behaviour on a deeper level. We learn how not to be fooled by human data, and how to view behaviours as the product of what’s already ‘in’ the mind. We learn how to refine ‘human factors’ to really understand the human.

The medical AI divide

In tandem with the resistance toward human factors, there seems to be a divide in the medical AI community between those who call for more information around the clinical use of AI, from patient outcomes to the quality of decision-making (although these two factors actually go hand in hand), and those who believe that technical performance is the most important step in evaluating an algorithm.

There are many potential reasons behind this divide, but I think it really comes down to perspective. Add time pressure and economic strain, and quick-and-fast solutions without evidence become the norm. There’s a misconception that evidence-based approaches exist to ‘hinder’ innovation. That we are delaying the use of technology as a solution in ‘fixing’ already strained healthcare systems.

It’s entirely reasonable to want better outcomes in healthcare, but I would argue, it is less reasonable to turn a blind eye when asked to provide evidence of AI as a viable solution to certain healthcare problems. Wouldn’t you want proof that your algorithm does for healthcare what you’ve actually created it to do?

Industries such as aviation, defence, and the automobile industry have prioritised pre- and post- deployment testing and monitoring of human behaviour with technology. Human factors are highly sought after, well-regarded, and have taught us a lot about safety in these highly critical fields. The reasoning behind it is perfectly simple — it’s better to be safe than sorry; and technologies can introduce a whole new world of ‘unsafe’ once they hit the human user if you’re not careful.

Of course, pilots are vastly different from doctors for good reason, but they are both experts that navigate high-risk, and (sometimes) stressful environments where human lives are in their hands. I recognise that there are many roles to fill in effective deployment of AI. From the technical to the economic, bottom up to the top down. Many hands make light work, as they say. I recognise that clinicians are not blessed with ample time. We want solutions that are seamless and the least disruptive to our workflow and priorities. That’s why the heavy lifting in deployment shouldn’t be the sole responsibility of the user.

And although I wish I could write the solutions right now, the answers are not always apparent. We are decades behind, trying to speed up years of research to safeguard algorithms being released into medicine right now. My work in this area is to actively seek the answers to the tough questions such as how do you test AI to determine its safety in human decision-making?

Increasing our understanding of the human-AI relationship

I want to see actionable testing paradigms, increased knowledge of the human-AI relationship, understandings of the blind spots, and greater understanding of the wider and complex medical system in which care is delivered for patients. It’s why I am involved with Project CANAIRI.

But before we can achieve this, we need to recognise where we’re falling behind, such as treating AI like a saviour and not a statistical model, which arguably limits the many potential benefits of AI algorithms in medical applications. You know when to turn on the cruise control in your car and (hopefully) when to take control of the wheel. We need, in a much larger way, to understand where AI fits within human medical expertise and within the nuts and bolts of the broader socio-technical system. After all, the data that trains the very algorithms in question is all the product of humans- very much to its own detriment sometimes!

More importantly, it’s not to say that medical AI algorithms can’t demonstrate robustness and high accuracy (on certain metrics). However, that might not translate into clinical outcomes. Because, once again, it must pass through a clinical system (that involves humans and the previously mentioned unpredictability of human behaviours at the forefront) that is far more intricate and complex than we could predict, let alone the other considerations of data and generalisability.

For instance, imagine giving a doctor a black box algorithm, and saying “use this algorithm which is highly effective on a certain set of cases but not on others. It’s kind of like a human but it’s also not like a human at all. It uses more data than you can fathom but we don’t necessarily know what elements of the data it uses. Trust it most of the time but not too much. Weigh it up with all the other pieces of infrastructure but prioritise it only when you think it’s right. Consider this and all these other clinical factors in your decision, but if it’s wrong, you’ll be the one on the line.” Not all that simple, is it?

Human decision-making is always going to drive medicine just as it drives what you’ll each for lunch today, whether you turn right onto a busy road, or who you pick in an eyewitness lineup. We make mistakes and we also make good decisions.

We have long-running fields dedicated to the study of all the intricate and hidden processes that drive these decisions and what can influence them in either direction. We have 50 years of research demonstrating that even the simplest and most accurate of technological aids can stray from their initial purpose once they enter a human system. Why would the application of a new and unpredictable technology such as AI into one of the most high-risk professions be any exception?

To me, it is common sense that we would centre human use at the forefront of medical AI evaluation. We need to divert our attention to the areas that need it most so that medical AI algorithms have a shot at doing what we’ve (hopefully) built them for — positive outcomes for clinicians, the healthcare system, and most importantly, patients.

Lana Tikhomirov is an AIML PhD student and cognitive psychology researcher.

Australian Institute for Machine Learning (AIML)
Australian Institute for Machine Learning (AIML)

Written by Australian Institute for Machine Learning (AIML)

AIML conducts competitive research in machine learning, artificial intelligence, computer vision, and deep learning. We're based in Adelaide, South Australia

No responses yet