My Notes from the FDA Patient Advisory Meeting on AI & ML in Medical Devices

The FDA held a day-long, 7-hour public virtual meeting to gather the Patient Engagement Advisory Committee to provide advice on the regulation and use of AI/ML devices.

These raw notes are my own interpretation and commentary of what I heard and saw live during this publicly available session (recording available on YouTube here). While I did attend the private breakout session, none of that information was included. I apologize for any misinterpretations.

I am very excited by and appreciative of the FDA for working to co-design AI/ML regulation with various stakeholders.

The FDA website for the event can be found here, which includes many attachments and additional information.

The FDA website on AI/ML regulation can be found here.

Visit my Building Trust and Adoption in Machine Learning in Healthcare site (link here) to see how clinicians view AI/ML tools.

Too Long; Didn’t Read (or Watch)

  • The Patient Engagement Advisory Committee (PEAC) is the FDA’s approach to including patients in the design and refinement of regulations. Today, they gathered input on future regulatory frameworks for modifications to AI/ML-based Software as a Medical Device (SaMDs).
  • FDA sees much opportunity and promise in AI/ML given the boom in data availability; yet, there is need to use a risk-based approach to regulate these fundamentally-different medical devices. They see the opportunities and challenges are related to transparency to users, explainable algorithms, data integrity and diversity, and evolving gold standards.
  • Clear communication, broad representation (and the data used to measure representation), and human factors / interaction are critical in ML tool development.
  • The patients cared that the data used to validate the models are diverse and inclusive, patient data ownership and stewardship is considered, patients have as much information as clinicians, digital diagnoses and treatments are understandable with clear suggestions for next steps, models’ confidence levels are transparent, and the gold standard be ever-rising to encourage innovation and quality.

Patient Engagement Advisory Committee (PEAC) Call to Order

Paul T. Conway, Chairperson, PEAC

  • Diane Johnson — J&J
  • Katherin Seelman — advocate; rehab professor at U Pitt
  • Cynthia Chauhan — advocate; clinical social worker, HF, kidney failure, glaucoma, cancer survivor, clinical trial participant
  • Bennet Dunlop — advocate; father of type I diabetes children, type II diabetes patient
  • Amy Leong — advocate; nonprofit chair, chronic diseases, many joint replacements
  • Monica Parker — advocate; physician, clinical trial researcher
  • Rita Roy — advocate; physician, National Spine Health Foundation, HealthComm Associates, spinal fusion and knee replacement
  • Suz Schrandt — advocate; Expect, childhood arthritis, many join replacements
  • Philip Rutherford — advocate; Faces & Voices of Recovery, substance use disorder
  • Steve Wilcox — advocate; Design Science, human factors
  • Michaele Tarver — FDA
  • Bakul Patel — FDA

Welcome and Opening Remarks

Jeffrey Shuren, M.D., J.D., CDRH Director, FDA

  • CDRH has been really busy during the pandemic
  • AI is hot right now and has been growing for a while
  • In 2019, FDA created an initial regulatory framework proposal on regulating AI/ML in SaMD and called for comments (link to download)
  • Today, is about asking patients their thoughts
  • FDA did a session in Summer 2020
  • Government needs to collaborate with the community when developing new regulations and solutions
  • “Collaborative Communities” for AI/ML include: (1) Xavier AI World Consortium and (2) Collaborative Community on Ophthalmic Imaging
  • CDRH recently (Fall 2020) launched the Digital Health Center of Excellence to support regulatory innovation and development related to AI/ML as well as other digital technologies
  • Upcoming events

Artificial Intelligence-Machine Learning: Validation

Bakul Patel, M.S.E.E., M.B.A., Director, Digital Health Division, CDRH, FDA

  • FDA sees much opportunity and promise in AI/ML given the boom in data availability, especially related to filling gaps in clinical supply and demand (e.g., not enough clinicians for patients)
  • Definition of SaMD
  • Definition of AI/ML
  • Goals for regulation
  • Giving patients access to technology
  • Helping manufacturer to rapidly improve development
  • But maintain safety
  • Typical Lifecycle of AI/ML models
  • There is a spectrum from “locked models” to continuously adaptive models (e.g., online learning, active learning)
  • Clinical evaluation and validation will still be required
  • How FDA initially thinks about regulating this space
  • How the FDA thinks about generalizability
  • There are many concerns (e.g., data quality and availability, model explainability, and catching and removing bias)
  • How the FDA thinks about class training and testing data hygiene
  • Where FDA would fit in the AI/ML lifecycle. The point of this meeting is to socialize and refine this
  • Most significant discussion points of today’s session

Artificial Intelligence-Machine Learning: Communication

Pat Baird, Head of Global Software Standards, Philips

  • Pat’s background
  • From triple aim to quadruple aim to aims that consider equity and inclusion
  • It’s important for good communication between diverse stakeholders — patients, clinicians, regulators, industry, others
  • Bias is an important topic
  • Gave an example of a Philips customer wanting a model tuned on an entire country, while another customer wanted a model tuned on a certain MSA; some customers understand data drift better than others
  • Suggests a clear taxonomy of terms available to manufacturers, including the different types of bias and the different types of stakeholders. This could be the role of a “collaborative community”
  • Trustworthiness is another important topic
  • Need to balance over and under trust
  • Summary of presentation
  • Xavier Project — Xavier University had a workshop on AI in healthcare and the group later published a paper on developing trust. Annual conference link

Representation of Diverse Groups in Test Sets

CAPT Terri Cornelison, M.D., Ph.D., F.A.C.O.G, Director, Health of Women, CDRH, FDA

  • Most algorithms design ignore age, gender, race, ethnicity when training; yet science suggests that there are clinically significant difference on impact in health
  • How different individual level information plays into differences in clinical health
  • These differences have an impact on clinical trials when looking at results at an aggregate level rather than nuanced by these different attributes
  • It is important to have representative data in AI models and doing the analytics correctly
  • How not considering these attributes could mess up ML models
  • Summary

Cognitive Human Factors

Kimberly Kontson, Ph.D., Biomedical Engineer, Division of Biomedical Physics, CDRH, FDA

  • How do humans interact with AI/ML?
  • Think about perception, cognition, action
  • Device = training manuals, device itself
  • There are a number of risks associated AI SaMD

Open Committee Discussion

  • Examples of AI/ML tools being built now in collaboration with patients
  • Run by human factors teams — interviews, focus groups
  • One person pushed back in wanting patients involved in the upstream design of tools
  • Is it actually a bad thing if attributes are included in models?
  • No. These attributes (e.g., race) are clinically relevant in certain situations and can be used to validate that selection bias is not happening or that differential performance by attribute is not happening

Virtual Breakout Summations

  • How does the app perform? How does it compare to standard of care? Where does it fit into the healthcare ecosystem? FDA approved, safe, secure, effective? How was the model validated? Is there a potential conflict of interest? Who made the application?
  • How is the data going to be stored and used? Is the algorithm locked or continuous? If used in future algorithm development, how will informed consent be done? How is race/ethnicity used in the model? What would be possible next steps in using the product based on the outcomes?
  • There is a need for more data based on demographics? What were the other attributes information / how was the data organized based on age, gender, etc?
  • Links to additional information about the results / disease. Streamlined way to connect with clinician about results (e.g., telehealth).
  • Information how training and validation data relates to the patient’s demographic. Disease progression information and future expectations.
  • They wanted humans to explain the results. Direction to other clinicians if some are not available. Direction connection to schedule doctor appointments. Connection to manufacturer customer service. Desire of more directive recommendations “Highly recommended that you see a clinician”. Make it clear that the app is not diagnostic. Link to care pathway document and what step the patient is currently at. Explanation how patient attributes may impact model performance.

Open Public Hearing

Zach Hornberger — MITA

  • Trust is critical with innovations that change healthcare delivery. It is not made easily or quickly
  • FDA should maintain its role of ensuring safety and public health

Erica Brown — Colon Town

  • Excited by AI because might be able to catch disease that would not be caught, reduce unnecessary scans, and reduce healthcare disparities
  • AI could be great for clinicians and patients, making diagnosis better than the human eye
  • Imaging and diagnostics would not need to be unnecessarily repeated, to reduce cost and reduce burden of patients
  • Diseases impact people differently by attribute (e.g., race) and are treated differently. AI could make sure that everyone receives evidence based care based on their personalized situation
  • FDA should continue to ensure safety, yet help drive innovation to support these three needs

Cecil Motley — Cardiometric Medical Systems

  • There are various ways to evaluate software code: stress test and failure mode analysis
  • Respiratory device in development looking to get FDA approval
  • Point of care diagnostic device for COVID detection looking to get FDA approval

Keith Dreyer — American College of Radiology Data Science Institute

  • There is very limited clinical evidence from AI algorithms
  • Bigger issues in cleared algorithms
  • Past public comments from ACR
  • Summary

Noah Zimmerman — Tempus Labs

  • Diversity is critical in SaMD development
  • AI can bring diversity into healthcare because it can make sense of Real World Data, when traditional approaches cannot
  • RWD is closer to the representative population (e.g., racial and gender minorities in current clinical research)
  • Diversity in model evaluation / validation is more important training
  • Need a lot of data in training and can’t always be picky on what you include or exclude
  • However, evaluation must be diverse and include good balance of diversity to ensure that the models work for all populations
  • Again, RWD can make test sets / hold out sets more robust
  • Regulatory focus should be disproportionately on validation data

Amit Kaushal — Stanford University

  • Did research to see where AI biomedical data come from
  • There are “data desert” in biomedical AI
  • This can cause bias and lower performance in under-represented populations
  • It is important that AI training data mirror the populations in which it will be used

Open Committee Discussion

There is expressed a need for more representative data available to AI research. Is there an agreed upon metric to prove representative data?

  • No but this is an important thing. Breakout session people mentioned that it was important to see that models were trained on people like them.
  • There needs to be a consideration for HIPAA with privacy and security

There is a concern around health literacy during the data collection process. How is that being done?

  • The public needs to understand the risks associated with models.
  • It was important for breakout session people to hear confidence scores for recommendations.

Industry seems to be playing to roll of data accountability rather than non academic medical centers

  • Breakout session people mentioned wanting to know more about the developer of the app and how the incentive structures were built

FDA needs to focus more on data collection for models

  • Breakout session people were very thoughtful of where the data came from

Committee Discussion of FDA Questions

  • One committee member agreed that validation data diversity should be the focus of attention rather than the training data
  • Validation should happen in clinical sites that are not exclusively academic training centers, since there are many other types of clinical delivery sites
  • FDA should demand diversity in data inputs for ML models
  • FDA should continue to encourage diversity in participants of clinical trials. This will have to overcome current mistrust by various communities.
  • Patient advisors and advocates must be included in the product design and development process. This will help patients trust the process
  • To the point that data is the new water, there have been geopolitical issues related control of water. We think about moments of inappropriate medical science (e.g., Tuskegee Study). Patients who contribute their RWD should have informed consent.
  • Providers must be informed about how patient data is being used and how research is going, not just academic researchers
  • The IRB process should be used for medical ML model development
  • Intentionality and unconscious bias is very important to consider during the development of ML
  • To build the trust of patients, FDA should partner with industry to improve health literacy and health education so that patients can make more informed decisions
  • Patients will be more open to participating in ML research if they better understand what the research will be used for
  • There must be a wide funnel from where data is gathered for building models and the FDA should require this
  • It is important to have diversity in ML developers
  • Device manufacturers should not make the use of the device conditional on an individual providing their data for further development
  • One committee member is concerned about devices that change from screen to diagnostics or other changes in device use. That would need more regulatory scrutiny
  • One committee member thinks that the FDA will not be approving a device on what it does today, but instead approving the trajectory. That would follow the 510k process for each approval
  • One committee member wants over disclosure and communication to patients
  • One committee member brought up the Cares Act guidance that some CDS tools for clinicians are not going to be regulated as closely, while those directed at patients are more regulated
  • Doctors do and should communicate to patients the risks of various devices
  • Patients have the right to understand what prompted the modification of devices
  • Healthcare should be based on shared medical decisions. Patients should have the option of learning about what modifications have been made to SaMDs
  • One committee member was curious on how SaMDs can be recalled
  • Patients want to know why a SaMD is updated. There are some differences in views on what information should be shared with patients and doctors or just doctors only.
  • There should be new flashes for updates to SaMDs as well as direct communication to providers. Patient advocacy organizations can then help broadcast the message
  • The companies should be held responsible for notifying patients about big changes
  • FDA should share information about product updates on digital channels, such as social media, as well as physical channels
  • Yes! The bar should be raised always and make it more competitive to improve
  • However, there might be a concern that a lower quality but lower cost device gets pushed out of the market
  • Post market analysis might play in here and change the bar
  • AI product are fundamentally different than traditional products in that they might change over time, so the post market surveillance would be measuring a dynamic product
  • Need to have clear checks and balance on how these products work
  • This is the same as an pacemaker
  • Patients need to understand how these products work and that there is informed consent. Patient preference is very important.
  • Prevention of issues should be more important than waiting to mitigate errors
  • Risk should be understood in advance and this is no different than other medical devices
  • It might be helpful for patients to understand what thresholds are used to trigger an alert
  • For autonomous at-home diagnostics, there needs to be clear instructions for patients on what next steps are needed once a diagnosis is made
  • It is important to consider how diagnostic news is taken by the patient and family. So when there are autonomous at-home solutions, there is no human around to address stress and concern
  • Things that are perceived as intuitive must be tested significantly with patients. Intuition may be very different based on background, such as age. It is important to have a technical support staff available to the patient when questions are asked. Patients should see the same information as clinicians.
  • Human factor considerations should be built for specific people, not the average person.
  • It is hard for patients to understand if a device is on and if it is working well.
  • One committee member liked the idea of patients to connect with each other who are using similar AI devices
  • It is important to be mindful of the internet infrastructure that is available for each patient
  • Patients and clinicians should see the same information and patients should receive it in a way that they understand
  • Universal design in the disability community supported the use of product of many different types of patients and that orientation should be leveraged for AI products

--

--

Harry Goldberg
Building Trust and Adoption in Machine Learning in Healthcare

Beyond healthcare ML research, I spend time as a UC Berkeley MBA/MPH, WEF Global Shaper, Instant Pot & sous vide lover, yoga & meditation follower, and fiance.