Building Trust and Adoption in Machine Learning in Healthcare: What Clinicians Say Matters

Insights from interviews with 18 Clinicians and recommendations for Product Managers and ML Engineers / Researchers, an abridged version of my MPH capstone for UC Berkeley’s School of Public Health

Below is an abridged version of my Spring 2020 MPH capstone for UC Berkeley’s School of Public Health. Visit the Building Trust and Adoption in Machine Learning in Healthcare site (link here) for the full version as well as upcoming summaries of interviews and additional research.

Note that citations, endnotes, exhibits, and detailed recommendations have been removed for ease of reading, but are available in the full text (link here).

Thank you to Dr. Ziad Obermeyer and Vince Law for being my readers and guiding me through this work.

The context of trust and adoption in ML in healthcare…

In healthcare, the terms artificial intelligence (“AI”) and machine learning (“ML”) are omnipresent and approaching the peak of the Gartner Hype Cycle. In fact, Gartner found that AI is the “most powerful and pervasive technology-based capability across care delivery, operations, and administrative activities.” Clinical ML tools have seen greater maturity, yet there are multiple barriers to broad adoption. Even though there have been significant improvements to the technological feasibility, to avoid another “AI Winter” in the clinical healthcare space, work must be done to improve product desirability. Therefore for my UC Berkeley MPH independent capstone project in Spring 2020, I sought to gain — and now share — a better understanding of how frontline clinicians perceive ML in healthcare and what ML tool developers, such as product managers and ML engineers, can do to build trust and adoption.

How I did the work…

Over seven weeks, I reached out to clinicians (both physicians and nurses) with the intention of gathering diverse viewpoints — across various roles / job titles, geographies, specialties, healthcare organization type, and expected remaining tenure.

I also conducted secondary research with journal articles and popular media articles. Journals articles were gathered from the most influential medical journals, the Doctor Penguin weekly newsletter, and leading Twitter accounts with an emphasis on those published no later than 2019. Popular media articles were sourced from a regular review of email newsletters.

Before the COVID-19 pandemic completely preoccupied clinicians, I was able to interview 18 individuals. These clinicians had diverse backgrounds and could speak to a range of topics.

See the full text (link here) for detailed interviewee inclusion criteria, the interview guide, a list of journals and email newsletters, and summary background information of interviewees.

The 10 Key Themes…

What I learned…

1. Familiarity with ML in healthcare

I hear things in the media about ML in healthcare. I hear it could be a great tool. There is a lot of potential for good but also harm.

Clinicians had a wide range of awareness of ML in healthcare that was not localized to their geography nor tenure. Those who were more familiar had specialized training or administrative roles. Most of the learnings came from popular media, leading clinicians to be attuned to the certain amount of hype in the space. This concern around hype is shared by other clinical researchers. However, most were not concerned about near-term significant impacts to their jobs, since they believed that healthcare fundamentally involves a human caring for a human.

2. Past and future use

Much of my time, up to eight hours per week, is spent on detailed interpretation and labeling images. If I had a reliable ML tool, I could better focus on acute issues of my patients or treat more people.

Clinicians broadly had not used, or at least think they had not used, ML tools in their clinical practice. However, they were familiar with triaging tools, which may utilize ML. Those who think they have used ML tools appreciated reductions in errors but also felt pressure to trust the predictions. Looking forward, clinicians were mostly open to using ML tools, so long as they could trust them and still “be in-the-loop.”

3. Excitement and concern

In many ways, clinician sentiment regarding ML in healthcare was positive. However, they did have several concerns, when asked directly. See the below table about their sentiment towards ML. These feelings are aligned with what other clinical researchers see as key issues to broad adoption.

4. Ethics and privacy

I don’t know if I have really thought about it all that much.

When asked to explore ethical considerations further, there were mixed viewpoints. Many did not feel equipped to discuss it, others did not think it was their place to think about ethics, lastly a few had well-formed opinions on the matter. Questions around bias were the most common ethical concerns. There was a thought that much of the bias could originate from the data used to train the algorithms. Also, there were concerns around what use cases could lead to ethical dilemmas, such as end of life care, use for specific demographic / socio-economic statuses, insurance company utilization management, and general for-profit interests.

Privacy is not a specific issue for ML; it is an issue for all digital technologies, and I have been wrestling with it my entire career.

Clinicians pointed to HIPAA being the best legal framework to manage privacy concerns, and the rest of their privacy considerations were in relation to non-healthcare Tech companies. They felt that these large and small non-healthcare Tech companies already had access to everyone’s data, so they were not as concerned about privacy or at least did not feel that the world was equipped to address it.

I have deferred my ethical questions to others. There are bioethicists who we work with who do the important thinking on this.

In general, each clinician found ethics to be important, but were not equipped to evaluate ML tools for ethical and privacy standards. Without a strong understanding of ML nor ethical frameworks, clinicians pointed to others for help evaluating ethical and privacy concerns. Clinicians, especially those at larger healthcare organizations, relied on healthcare administrators tasked with figuring out these issues.

5. ML knowledge and model explainability

I know how to use the outputs of X-ray machines, CT scanners, and MRI machines to help my patients. And as I think of it, I only have a very rudimentary understanding of how those machines actually work; more of an intuition versus expertise. So maybe for an ML tool, I don’t need to know as much.

No clinicians felt the need to have more than a basic level of understanding of ML, similar to their basic level of understanding of how other devices work, such as X-ray machines, CT scanners, and MRI machines. Instead, they wanted to know that the algorithms used representative data and had strong performance metrics — the same ones used to evaluate diagnostics. But even with these metrics, they worried if they had an ML tool, that there would be unanticipated risks and no way to understand errors when they occur. There have been many calls for more explainable models in the healthcare setting, yet limited perspectives as to how ML will impact medical malpractice.

6. External validation needs

I would need to see some sort of clinical trial in which an ML tool was deployed, and I get that it is hard to do a trial like this. I would also want to see findings on if there are improvements in the quality of care, clinic throughput, and clinician and patient quality of life.

Clinicians wanted external validation from the FDA before using ML tools. While they preferred gold standard randomized controlled trials (“RCTs”), today, there are very few RCTs and prospective non-RCTs for deep learning models. Yet, clinicians understood that some use cases will need to pass with less rigorous clinical evidence. Many clinicians considered other healthcare providers or professional societies as signs for external validation, or they looked to both their senior and junior colleagues for advice. External validation was used to build trust in these ML tools along with legal safety.

7. Future of clinical education

My current thought is that all new clinicians should be at least somewhat aware of the technology at a bare minimum — knowing very vaguely of how it works, which use cases are better vs. worse, how human clinical judgement will be impacted, and how clinical specialties might look in the future. At the moment, I think all of this information could be included in about four short lectures. But in the future, there may need to be a significant curriculum redesign.

Despite expressing a need for general clinical education reform, many clinicians predicted clinical education will teach those — from early trainees to late career practitioners — to be more data-literate. Today, an ML primer was all that they wanted; but tomorrow, clinicians expected ML to be infused in all parts of clinical education. Work is being done today to begin including ML into clinical education as well as to think about longer term implications. Conversely, some thought that ML tools should be withheld early, so as to not diminish teaching clinical judgement. Overall, clinicians did not see ML as automating them away but do see it as driving meaningful change to their profession.

8. Desired use cases

Clinicians were excited to think of ways in which ML might be applied to their clinical specialty and others’. While they were not told what is possible, many had ideas about what needs exist today. See the below table about the desired use cases identified during interviews.

9. Implementation

I think there needs to be convincing validation as a first step. Then when it is fully implemented, there needs to be continued surveillance of performance. For example, these tools may be built in another setting, so we need to make sure they work at our place.

Clinicians wanted to make sure that implementations did not leave an unproven ML tool in a less-trained clinician’s hands and did not add complication to their existing workflows. Those with administrative and business backgrounds knew the importance of a phased implementation as well as a systems-thinking mindset, such as considering what clinical delivery resources were available downstream from any automation source. They wanted to make sure clinicians were involved in implementations yet struggled with the short-term distraction away from care delivery.

10. Buying process

It has to have a very clear value proposition. What is the ROI going to be? Will there be a meaningful return? These companies tell me how we will practice better, but I also need to know how we will save or make money. Sadly, the system doesn’t incentivize us to do better, it incentivizes us to work faster.

Clinicians explained that the buying process for an ML tool would be complicated, with many stakeholders involved in decision making. These stakeholders appeared to differ in every setting as did their proof points for coming to a buying decision. ROI was the most common value proposition, yet it needed to be positioned differently when speaking with a provider in a fee-for-service world versus one in value based care. They were sensitive to the recent history of burnout from EHRs, and trusted established technology brands over less-known startups. Smaller providers said they would struggle to bet on new technology.

What I plan to do next…

As ML advances and becomes more common in the healthcare setting, product managers and ML engineers / researchers must better understand clinicians’ viewpoints to build trust and adoption. Looking forward, I will continue my research, gathering and analyzing the perspectives of these ML tool developers and publishing additional findings.

--

--

Harry Goldberg
Building Trust and Adoption in Machine Learning in Healthcare

Beyond healthcare ML research, I spend time as a UC Berkeley MBA/MPH, WEF Global Shaper, Instant Pot & sous vide lover, yoga & meditation follower, and fiance.