Clinical Trials Demystified: An AI Researcher’s Guide (Part 1) — Workflow for Clinical Trials

Jimeng
6 min read5 days ago

--

Jimeng Sun (Keiji AI)

As a professor working at the intersection of AI and healthcare, and as founder of Keiji AI, I’m incredibly excited about the transformation happening in clinical trials. The recent SCOPE Summit 2025, the largest clinical operations conference, highlighted how AI is revolutionizing this $80B+ industry. Let us walk you through why this is one of the most promising applications of AI that you’ve probably never thought about.

Clinical trials are essential to translating cutting-edge biomedical discoveries into lifesaving treatments, yet they’re often seen as a complex maze, particularly by AI researchers entering healthcare. However, this complexity offers exciting opportunities: AI can profoundly streamline and enhance each step — from trial design and patient recruitment to regulatory submission and post-market monitoring. In this 4-part primer, specifically crafted for AI researchers by fellow AI experts, we demystify the clinical trial process and highlight practical entry points where AI can make transformative impacts. Whether your passion is natural language processing, generative AI, predictive modeling, or computer vision, the clinical trial domain is ripe for innovation. Dive in to discover how your expertise can accelerate trials, reduce costs, improve patient safety, and ultimately bring groundbreaking therapies to patients faster.

Clinical Trial Workflow and AI opportunities

Clinical trials follow a multi-stage process from early research to post-market monitoring. Below is a stepwise workflow outlining key stages, with indications of where AI can streamline these steps.

  • Preclinical Research: Before human testing, a new drug or device undergoes laboratory and animal studies to evaluate safety and biological activity. Successful preclinical results support an Investigational New Drug (IND) application to regulators. An IND is required in the U.S. to obtain FDA authorization to ship and administer an experimental drug to humans ( IND and NDA: what is the difference? ). The IND dossier includes preclinical data, manufacturing information, and the initial trial protocol to ensure no unreasonable risk to participants ( IND and NDA: what is the difference? ) ( IND and NDA: what is the difference? ). Once the IND is cleared (or after a waiting period of ~30 days with no FDA hold), clinical trials can commence.
  • Trial Planning & Site Selection: Upon IND clearance, sponsors design the clinical trial protocol (detailing objectives, design, methods, and statistical considerations (GUIDELINE FOR GOOD CLINICAL PRACTICE)) and obtain ethics approval. Sponsors select investigative sites (hospitals/clinics) and qualified investigators to conduct the trial. Site selection considers factors like patient availability, facilities, and past performance. Patient recruitment strategies are put in place to enroll eligible participants (e.g. outreach, databases). AI Contribution: Machine learning can assist in protocol design (e.g. dose selection and sample size) by analyzing prior trials (Clinical Trial Briefing Infographic (800 x 1300 px)). AI tools also analyze real-world data to identify optimal sites and recruit patients, reducing guesswork. For example, AI-driven analysis of historical site performance can predict which sites will enroll well (AI in clinical trials | Salesforce US), and natural language processing (NLP) can scan electronic health records to find eligible patients faster ().
  • Phase I (Safety & Dosing): First-in-human trials on a small group (tens) of subjects — often healthy volunteers — to assess safety, tolerability, and pharmacokinetics (Step 3: Clinical Research | FDA) (Step 3: Clinical Research | FDA). Phase I trials gradually escalate dose to determine a safe dosage range and identify side effects. Only ~70% of drug candidates pass Phase I. These studies are typically open-label and may be conducted in specialized phase I units. AI Contribution: In Phase I, AI can automate data capture from wearable sensors and labs for real-time safety monitoring. An AI system can flag abnormal vital signs or lab results to researchers for prompt action, augmenting traditional safety monitoring (AI in clinical trials | Salesforce US).
  • Phase II (Efficacy & Dosing): Trials in a larger group of patients (often 100–300) with the target condition to evaluate efficacy and optimal dosing while continuing to monitor safety (Phases In Clinical Trials Explained by CRO Diagram Research). Phase II provides initial evidence of whether the drug works in humans and refines the dose regimen. Only about one-third of candidates progress past Phase II. AI Contribution: AI algorithms can help identify patient sub-populations that respond better (through subgroup pattern mining) and manage incoming data quality. NLP can be used to ensure protocol adherence by mining visit notes for protocol deviations or to automate grading of patient-reported outcomes for efficiency.
  • Phase III (Pivotal Trials): Large-scale trials (hundreds to thousands of patients, possibly multi-national) to definitively assess efficacy and safety compared to standard treatment or placebo (Phases In Clinical Trials Explained by CRO Diagram Research). Phase III trials generate the comprehensive data required for regulatory approval. They confirm therapeutic benefit, characterize common side effects, and collect data in diverse populations. Roughly 25–30% of drugs in Phase III succeed. AI Contribution: With massive Phase III data, AI can assist in risk-based monitoring — e.g. machine learning models highlight anomalies in data that might indicate errors or misconduct, focusing human monitoring efforts where needed. AI-based computer vision can be applied if the trial involves medical imaging (for example, automatically measuring tumor sizes in oncology trials), ensuring consistency in endpoint assessments (Artificial Intelligence in Medical Imaging — Spectral AI). Predictive models can also forecast enrollment trends or patient drop-out risk, enabling proactive adjustments to keep the trial on track.
  • Regulatory Submission (NDA/BLA): If Phase III demonstrates safety and efficacy, the sponsor compiles all trial data and study reports into a New Drug Application (NDA) (or Biologics License Application, BLA) for FDA review (similarly, a Marketing Authorization Application to EMA in Europe). The NDA contains extensive documentation of manufacturing, preclinical and clinical data, and proposed labeling ( IND and NDA: what is the difference? ). Regulators examine whether the benefits outweigh risks and if the product is high quality. AI Contribution: AI tools can accelerate data preparation by converting trial datasets into submission-ready formats and checking compliance with data standards. Notably, regulators now require submission data to follow CDISC standards (e.g. SDTM for trial data and ADaM for analysis datasets) (FDA Binding Guidance: A Pivotal Milestone for CDISC Standards), which makes the data more machine-readable. This standardization itself is “AI-ready” — it enables regulatory reviewers to use automated analyses on the submitted data (FDA Binding Guidance: A Pivotal Milestone for CDISC Standards). AI might also be used to detect errors or inconsistencies in the huge NDA dossier before submission, acting as a quality gate.
  • Phase IV (Post-Market Surveillance): After approval, the drug enters the market but monitoring continues. Phase IV studies (and other post-marketing surveillance) collect real-world data on long-term safety or rarer side effects in broader patient populations (Phases In Clinical Trials Explained by CRO Diagram Research). These studies may also explore new uses or populations. AI Contribution: AI systems are extremely valuable in Phase IV to analyze real-world data (RWD) from electronic health records, insurance claims, and patient registries. For instance, signals of rare adverse events can be detected by AI algorithms sifting through millions of data points in health databases, which can prompt further investigation. Such AI-driven pharmacovigilance can identify safety issues more quickly than traditional methods. Regulators and sponsors are increasingly leveraging these tools — e.g. FDA’s Sentinel Initiative uses large healthcare data networks, and FDA has collaborated with AI platforms (like Aetion) to rapidly assess real-world safety and effectiveness data (FDA selects Aetion Evidence Platform | Aetion FDA Regulatory Science).

AI Opportunities Across the Workflow: Modern AI techniques provide support at multiple stages of a trial. During protocol design, AI can simulate trial outcomes under various designs, helping optimize endpoints and inclusion criteria. For patient recruitment, NLP and predictive models mine health records and public data to match patients to trial criteria much faster than manual screening. In site selection and trial monitoring, machine learning can predict which trial sites will perform well (enroll diversely, adhere to data entry timelines) and flag sites that may need additional support. AI-driven analytics also enable adaptive trials — for example, algorithms can continuously analyze incoming efficacy data and recommend modifications to the trial (such as dropping inferior treatment arms or recalculating sample size) in pre-planned adaptive trial designs. While humans make the final decisions, AI provides the data-driven insights to streamline the clinical trial pipeline. The end result can be shorter trial durations and reduced costs, getting new therapies to patients faster.

--

--

No responses yet