Why I Wrote a 59-Page Peer-Reviewed Paper on the 12 Best Practices for AI in Healthcare

I delve into the writing process, the underlying methodology, and a discussion about the industry significance of my newly published peer-reviewed research.

7 min readSep 3, 2023

I have no doubt that AI will become an integral part of clinical practice. However, as of now, it remains outside mainstream acceptance, primarily due to clinicians’ apprehension of the unknown and the lack of established best practices for AI in healthcare.

This gap inspired me to write this paper, which was eventually published in the peer-reviewed journal “Clinica Chimica Acta” on August 16, 2023 (doi: 10.1016/j.cca.2023.117519). A link to the full paper, available without a paywall, is provided at the end of this article.

This paper represents the first effort to consolidate existing research and lay a foundation for best practices in clinical AI.

Four years ago, when I co-founded WellAI, I did not anticipate that safety and industry best practices would be at the forefront of our customers’ concerns. I passionately promoted our product, the AI Health Assistant, as a groundbreaking tool for optimizing health outcomes. While I remain confident in its potential, it became evident that while we adhered to HIPAA standards, this wasn’t our customers’ primary concern about our AI system. They posed questions such as:

👉 What happens if something goes wrong? Can the AI system resolve issues both intelligently and transparently?

👉 Have the AI models undergone thorough training, testing, and validation? Are biases addressed?

👉 How secure and private is patient data (PHI)?

👉 What are the established best practices for AI in healthcare?

The last question’s answer was unsettlingly straightforward: there weren’t any.

Prompted by these concerns, I felt compelled to address the issue. While the paper I wrote faced its share of rejections, its inception was due to the frequent queries about best practices in the healthcare AI domain. To address this, I delved into over 200 articles, extracting their most valuable insights, which combined with my proposals, culminated in this review.

This isn’t my inaugural work on establishing industry best practices. In 2011, the finance world was rocked by the AXA Rosenberg scandal, leading to widespread confusion over regulatory double standard to quants. I spearheaded the organization of an industry-wide conference in NYC, inviting all stakeholders including regulators (the SEC), the CEO of AXA Rosenberg (a bold move, given the ongoing SEC investigation), and finance quants. I also co-authored best practices in quantitative finance, which, I must say, are vastly different from those in healthcare AI.

Since 2021, I’ve shared my research progress with colleagues from the Working Group on Artificial Intelligence and Genomic Diagnostics (WG-AIGD), a topic already on the group’s agenda. Naturally, I assumed a leading role in these discussions. While the content of the paper is solely my contribution, Professor Larry Kricka played a pivotal role in refining its presentation.

As I faced repeated rejections from various journals, I consistently updated my paper to reflect recent developments. To my pleasant surprise, there were quite a few. Two recent efforts stand out:

🩸The American Medical Association (AMA) agreed to develop principles and recommendations concerning the benefits and potential unforeseen consequences of relying on AI-generated medical advice and content that may not be validated, accurate, or appropriate, during its annual meeting in June 2023. [https://www.healthcarefinancenews.com/news/ama-agrees-develop-principles-benefits-and-unforeseen-consequences-ai-generated-medical-advice]

🩸 The American Academy of Family Physicians (AAFP) published its “Ethical Application of Artificial Intelligence in Family Medicine” in July 2023, setting forth an initial set of principles they believe must be followed in the domain of AI/ML if they are to be applied in family medicine. [https://www.aafp.org/about/policies/all/ethical-ai.html]

The 12 key aspects of best practices for AI in clinical practice that I identified echo the AI principles put forth by the AMA and the AAFP.

For those interested in some specific “behind the scenes” discussions related to my research, here are some fun facts and details from my writing journey, based on discussions with reviewers, editors, and colleagues:

💠 The initial draft, dated 11/30/2021, contained 21,273 words spanning 53 pages. It can be accessed here: https://medium.com/@WellAI/best-practices-for-ai-in-healthcare-a-comprehensive-overview-49094cb7bbf8. Reviewers required me to reduce it to 9,411 words, or 25 pages, in the final version.

💠 This research endeavor presented a unique experience for me. Typically, my studies focus on the applications of mathematical modeling. However, this paper does not contain a single mathematical formula.

💠 In my first draft, I mentioned the names of 31 researchers, data scientists, and CEOs. I didn’t drop the names casually or simply as references like “Schmidt et al”. No, I believed these 31 individuals perfectly fit my narrative based on what they said, wrote, or discovered. The editors indicated that unless the name is a “popular household name,” it’s deemed promotional and should be moved to the References section. Out of these 31 names, they only permitted one to remain: Cassie Kozyrkov, Chief Decision Scientist at Google. She was the only one considered by the editors to be an established household name.

💠 In the earlier version of the manuscript, I dedicated a significant amount of time arguing that, while Andrew Ng is a brilliant data scientist, I questioned the novelty of his concept of “data-centric AI”. I believed it was a clever play on words, but there was nothing groundbreaking about this approach. First, the idea that data should be central to an AI algorithm is decades old. Second, the notion that high-quality synthetic data is vital is also far from a new concept. My discussion on this topic was so heated that the editor suggested I temper my tone. After I was also required to reduce the word count due to journal article size restrictions, my entire discussion on data-centric AI was reduced to just two short sentences. Few would discern the passion behind my critique of Andrew Ng and his concept of data-centric AI.

💠 Reviewers strongly urged me to offer clear solutions and delve into specifics on how various ML and statistical challenges, such as insufficient sample size, inadequate statistical significance, and self-serving bias, can be addressed in practice. I hope the final version accomplishes this.

💠 One reviewer suggested applying the recent recommendations from the American Statistical Association (ASA), especially those related to reporting the effect size (i.e., practical relevance or clinical significance) and the confidence interval of the estimated statistics.

💠 A reviewer noted that while p-hacking can be a form of self-serving bias, not all self-serving biases manifest as p-hacking. Given this observation, I detailed which biases should be avoided. Specifically, I spent time explaining the dangers of p-hacking in clinical studies. Could we claim that p-hacking killed Babylon Health? Too early? In reality, it was misrepresentation and fraud that brought down Babylon Health, but I’ll delve into that in my next article.

💠 Additionally, a reviewer suggested incorporating “data shopping” as a classic example of p-hacking. Hence, the inclusion of “data shopping”.

💠 One point of discussion absent from the initial draft was the significance of selecting the right metrics when dealing with imbalanced data. Standard metrics can often be misleading in these scenarios.

💠 I’ve dedicated more time to discussing how to avoid overfitting. I’ve provided specific recommendations to mitigate this issue, such as employing regularization techniques, cross-validation, and early stopping.

💠 I explore the potential of techniques like active learning and federated learning to improve the generalization of AI models to real-world scenarios.

💠 A reviewer observed that, although I emphasized the importance of reproducibility in AI research, I hadn’t offered in-depth strategies or discussed the associated challenges. In the final version, I’ve addressed this, elucidating the vital components in a framework for reproducibility and explaining how they can be documented to promote reproducibility.

💠 I hope you’ll notice that I’ve concentrated on the specifics of ethical AI guidelines that ought to be adhered to in clinical practice. The reviewers strongly emphasized this point.

I sincerely hope my research benefits patients, providers, and anyone who is hesitant about integrating AI into clinical practice. Our work is not yet done. Although I feel I’ve included an extensive list of specific proposals in the paper, the most crucial next step is for regulators, data scientists, patient advocates, and many other interested parties to collaborate and develop actionable plans. These plans should aim to advance AI in healthcare while ensuring all necessary safety precautions are in place. I would be honored if this work serves as the foundation for that collaboration.

P.S. As promised, here’s the exclusive ‘backstage pass’ to the article for my subscribers — no paywall: https://authors.elsevier.com/a/1heV62G2M6mLC. Enjoy, and thank you deeply for your support. You can’t imagine how much it means to me. I hope to one day share my full story of extreme personal adversity. If you haven’t followed me already, please consider doing so to stay updated and gain early access to more exclusive content. Your engagement and feedback fuel my passion and drive to keep writing and sharing. Thanks again for being a part of my community.

Why I Wrote a 59-Page Peer-Reviewed Paper on the 12 Best Practices for AI in Healthcare

I delve into the writing process, the underlying methodology, and a discussion about the industry significance of my newly published peer-reviewed research.

Written by Sergei Polevikov