Reverse Engineering the Statistical Analyses in the Moderna Protocol

YP
6 min readDec 6, 2020

--

As a non-expert trying to understand the statistics behind the Covid-19 Phase 3 Vaccine Trials, I encountered many information gaps (stemming from unstated assumptions) in my deep dive through Moderna / Pfizer / AstraZeneca’s trial protocols. Lacking the relevant context, I found the protocols unduly complex. Below are some answers to key questions that others may find useful and amateur code to reproduce Moderna’s statistical trial design.

First, a recap of the key statistical considerations for the uninitiated —

1. Vaccine Efficacy (VE)

  • 60% as the target VE
  • 30% as the null hypothesis (i.e. H0: VE ≤ 0.3)

2. Number of Participants

Total Sample Size

  • 30,000, to account for 15% participants to be excluded.

Covid-19 Positive Sample Size

  • 53 cases for Interim Analysis 1 (35% of total)
  • 106 cases for Interim Analysis 2(70% of total)
  • 151 cases at the Final Analysis

3. Statistical analyses

1-sided O’Brien-Fleming boundary for efficacy monitoring

  • 4.6% probability of boundary crossing at IA1
  • 61.5% probability of boundary crossing at IA2
  • 90.0% probability of boundary crossing at the final analysis

1-sided false positive error rate of 0.025 with log-rank test statistic

  • 0.0002 nominal alpha at IA1
  • 0.0073 nominal alpha at IA2
  • 0.0227 nominal alpha at the final analysis

VE at bound, rejecting H0: VE ≤ 30%.

  • VE ≥ 0.741 at IA1 (HR ≤ 0.259)
  • VE ≥ 0.565 at IA2 (HR ≤ 0.435)
  • VE ≥ 0.495 at the final analysis (HR ≤ 0.505)

How did they come up with all these numbers? To answer that, it’s helpful to know why the phase 3 trial enrolled 30,000 participants but planned to make decisions on the basis of only 151 positive cases.

According to the WHO’s blueprint:

This fixed number of 150 endpoints is set to provide sufficient power to detect a predefined target level of VE, rejecting the initially specified null hypothesis that VE is < 30%…. In simpler terms, if the true vaccine efficacy is 60%, then analyzing a total of 150 cases would provide a 90% chance that the actual results are at least as promising as 50 vs 100 cases. Such a result would indicate 50% vaccine efficacy (with a 95% confidence interval of 30% to 65% for vaccine efficacy).

Huh. So ≥150 sample size, 90% power, 60% “true” VE and 50% “desired" VE can be treated magic numbers to fit this model (and the rest of the numbers derived). Hopefully, that helps to reduce the protocols’ complexity 🙂.

OK but how is the total sample size calculated?

Moderna states that their sample size is calculated with the R package gsdesign. I was able to find a match to their original trial design after some “trial” and error, by plugging in H0 VE (30%), H1 VE (60%), expected placebo incidence rate (0.75%), dropout rate (2%), study duration (25 months), enrollment duration (23.5–24 months) into gsSurv().

library(gsDesign) x <-gsSurv(k=3, test.type = 1, hr=0.4, hr0=0.7, lambdaC=0.00075, eta=0.02, T=25, minfup=1, ratio=1, sfu=sfLDOF, timing=c(0.35,0.7), alpha = 0.025, sided = 1, beta = 0.1)cat(summary(x))## One-sided group sequential design with 3 analyses, time-to-event outcome with sample size 25988 and 151 events required, 90 percent power, 2.5 percent (1-sided) Type I error to detect a hazard ratio of 0.4 with a null hypothesis hazard ratio of 0.7. Enrollment and total study durations are assumed to be 24 and 25 months, respectively. Efficacy bounds derived using a Lan-DeMets O'Brien-Fleming approximation spending function with none = 1.

The output sample size is then multiplied by 115% to account for the provision of 15% participants to be excluded to obtain the expected sample size of ~30,000.

If that is not convincing enough, take a look at how well gsBoundSummary() matches the trial’s statistical considerations.

gsBoundSummary(x)### Analysis              Value Efficacy
IA 1: 35% Z 3.6128
N: 15476 p (1-sided) 0.0002
Events: 53 ~HR at bound 0.2586
P(Cross) if HR=0.7 0.0002
P(Cross) if HR=0.4 0.0463
IA 2: 70% Z 2.4405
N: 22338 p (1-sided) 0.0073
Events: 106 ~HR at bound 0.4350
P(Cross) if HR=0.7 0.0074
P(Cross) if HR=0.4 0.6146
Final Z 2.0002
N: 25506 p (1-sided) 0.0227
Events: 151 ~HR at bound 0.5052
P(Cross) if HR=0.7 0.0250
P(Cross) if HR=0.4 0.9000

It actually matches to the 3rd decimal place, which is pretty snap.

Why does the null hypothesis presumes a Vaccine Efficacy of less than 30% (i.e. H0 efficacy: VE ≤ 0.3)? This same null hypothesis is also used by Pfizer and AstraZeneca but is more well explained by the WHO’s Covid-19 Vaccine Trial Blueprint.

Subject to adaptation as the trial proceeds, a successful vaccine will have a sequential-monitoring-adjusted 95% lower bound of the confidence interval on vaccine efficacy that exceeds 30%. The point estimate for vaccine efficacy (VE) should be at least 50%, in agreement with the minimum requirement given in the WHO Target Product Profile.

This is a little strange because if the goal is to obtain a target VE of 50%, the null hypothesis to reject ought be set just below it, for example, with a H0 efficacy: VE ≤ 0.499. The confidence interval does not provide the same guarantee, since the true VE could fall within the 30–50% range, and a higher H0 efficacy would be harder to reject. If it sounds like cheating in a statistical sense, it kinda is! The WHO goes on to say…

If widespread transmission persists such that a meaningfully higher ‘null hypothesis’ could be statistically rejected by accumulating more endpoints in an acceptably short period of time, the study will continue in order to accumulate those endpoints to yield greater certainty about vaccine efficacy. To avoid penalizing vaccine developers for evaluating their individual vaccines in a common core trial, there will not be a formal multiplicity adjustment in the statistical analysis of vaccine efficacy based on the number of vaccine regimens under study. In summary, these success criteria have been set so that a vaccine with estimated efficacy of 50% or higher would have high likelihood of being successful in a trial of feasible size and duration.

Huh. Well, exigencies matter and the same guidance provided in the FDA’s guidance for Covid-19 vaccine licensure.

To be clear, current evidence suggests that multiple Covid-19 vaccines are both safe and effective. But for kicks let’s take a look at what a statistical guarantee of ≥50% would require by subbing in VE = 0.499 (HR =0.501).

x <-gsSurv(k=3, test.type = 1, lambdaC=0.00075, hr=0.4, hr0=0.501, eta=0.02, T=25, minfup=1.5, ratio=1, sfu=sfLDOF, timing=c(0.35,0.7), alpha = 0.025, sided = 1, beta = 0.1)cat(summary(x))### One-sided group sequential design with 3 analyses, time-to-event outcome with sample size 165690 and 978 events required, 90 percent power, 2.5 percent (1-sided) Type I error to detect a hazard ratio of 0.4 with a null hypothesis hazard ratio of 0.5. Enrollment and total study durations are assumed to be 23.5 and 25 months, respectively. Efficacy bounds derived using a Lan-DeMets O'Brien-Fleming approximation spending function with none = 1.gsBoundSummary(x)### Analysis              Value Efficacy
IA 1: 35% Z 3.6128
N: 100532 p (1-sided) 0.0002
Events: 342 ~HR at bound 0.3390
P(Cross) if HR=0.5 0.0002
P(Cross) if HR=0.4 0.0463
IA 2: 70% Z 2.4405
N: 145114 p (1-sided) 0.0073
Events: 684 ~HR at bound 0.4157
P(Cross) if HR=0.5 0.0074
P(Cross) if HR=0.4 0.6146
Final Z 2.0002
N: 165690 p (1-sided) 0.0227
Events: 978 ~HR at bound 0.4408
P(Cross) if HR=0.5 0.0250
P(Cross) if HR=0.4 0.9000

The study would have to be 5–6 times larger. 🙂

--

--