How to Calculate Key Statistics in Vaccine Clinical Trials

Pfizer/BioNTech achieved 95% vaccine efficacy

A total of 43,000 volunteers participated in Pfizer/BioNTech’s Phase 3 vaccine trial. 50% of the participants took the vaccine, and the other 50% took the placebo. The results show that there were 170 cases of Covid-19 across the trial group — 162 observed in the placebo group and 8 in the vaccine group.

Given the needed information, one would be able to do the following calculations.

  1. Calculate the vaccine efficacy rate
  2. Calculate the p-value
  3. Estimate the upper and lower bound at a confidence interval of 95%

1. Vaccine Efficacy Rate

Use the following formula to calculate VER.

#import python libraries
import numpy as np
from scipy.stats import norm
from statsmodels.stats.proportion import proportion_confint
#calculate VER
ver = (162 - 8)/162
print('VER=%.4f' % ver)

Output: VER=0.9506

2. p-value

“The p-value tells us how likely it is to get a result like this if the Null Hypothesis is true” -Dr Nic’s Maths[1].

Before calculating p-value, use this equation to calculate z-score.

The p-value for a two-tailed test is 2 times the p-value of a one-tailed test. The sum of the two dark blue areas (Fig. 1) is the probability at confidence level α=0.05.

Figure 1: Normal Probability

Given a z-score=-1.96, a simple way to calculate the p-value is to use the Cumulative Distribution Function (CDF) which takes z-score as an argument and returns the left area shaded blue.

#initialize variables
population = 43000/2
p_population = 162/population
p_sample = 8/population
n = 162 # normalization number i.e. the no. of samples
#calculate z-score
z_score = (p_sample - p_population)/np.sqrt(p_sample*(1-p_sample)/n)
#calculate p-value using cumulative distribution function (cdf)
p_value = norm.cdf(z_score)*2
print('p_value=%f' % p_value)

Output: p_value=0.000002

3. Lower and Upper Bound

The “proportion_confint” function makes the Gaussian assumption for the Binomial distribution, It takes the count of successes (or failures), the total number of trials, and the significance level as arguments. It returns the lower and upper bound of the confidence interval [2].

#estimate lower and upper bounds at a confidence level of 0.05
lower, upper = proportion_confint(162-8, 162, 0.05)
print('lower=%.4f, upper=%.4f' % (lower, upper))

Output: lower=0.9173, upper=0.9840

References

[1] Dr Nic’s Maths and Stats (2019), accessible at https://www.youtube.com/channel/UCG32MfGLit1pcqCRXyy9cAg

[2] Dr Brownlee J. “Confidence Intervals for Machine Learning” (2018), accessible at https://machinelearningmastery.com/confidence-intervals-for-machine-learning

--

--