Confidence vs Prediction intervals

How choices influence results: the case of still-birth following vaccination

Dr. Marc Jacobs
7 min readJun 22, 2022

Some time ago Nature Communications published a systematic review and meta-analysis by Prasad et al., regarding vaccinations for Covid-19 and several important outcomes. One of which was stillbirth. In short, the review showed that vaccinations protected pregnant women from still-birth. To their defense, they did state that the relationship is not causal, but we all know that once a significant p-value (or an equivalent confidence limit) is provided further questions are halted. The exact forest plot they used to showcase the result is the following:

The forest plot clearly shows a beneficial odds ratio of 0.85. Although the confidence limits are narrow, they do tough the 0.99 boundary. This should get any clinician looking at these results wary, since it means that a different choice in statistics will most likely abolish these findings.

In my view, the systematic review cannot support the statement that vaccination decreases the risk of stillbirth. Although the search criteria and methodological considerations of the paper look sound, I had trouble replicating the results using the same software, the same statistical package, and the same choice as deducted from the publication. To this end, I submitted my concerns and code to the last author of the paper, Dr. Asma Khalil who was so kind to reply and show me the exact codes used (codes can be found at the end). The data was already available via the website which is great. Not often do you get raw data. In the end, the replication procedure was very straightforward.

After running the codes and applying the six methods communicated by the authors, I obtained six times a Pooled Odds Ratio (OR) with confidence intervals less than one. The methods used are: Generalized Linear Mixed Model (GLMM) with fixed study effects (OR: 0.85 [0.71; 0.99]), GLMM with a conditional Binomial-Normal (OR: 0.85 [0.73; 0.99], a GLMM with a conditional Hypergeometric-Normal (OR: 0.85 [0.73; 0.99]), a Mantel-Haenszel (OR: 0.86 [0.74; 0.99]), an Inverse (OR: 0.86 [0.74; 0.99]) , and a Peto model (OR: 0.86 [0.74; 0.99]). All models, except the first GLMM, provide random effects estimates. The reported model in the paper was a GLMM yielding an OR estimate of 0.85 [0.73; 0.99].

The first of six codes sent by the last author.
Results of the meta-analysis.
Forest plot of the model.

Now, what should become immediately clear is that although the confidence levels are within boundaries (OR<1), the prediction interval is not. That is not surprising, since predictions have additional variance components that are not included in the confidence interval. The confidence interval tells you something about the uncertainty regarding the estimate of the coefficients of interest. The prediction interval, however, shows you how certain you can be about any new case coming up. Hence, the results show that it is really not that clear if a new pregnant women will have any benefit from being vaccinated with regards to stillbirth. This alone should make you very humble about the results.

An example using a simulated dataset showing the relationship between height and weight. The points are the observed data, the blue line if the mean regression line and the grey shaded error is the confidence interval regarding the coefficient of height on weight. The red line, however, is the prediction interval. Although you may be pretty confident on your coefficient, your predictions have the added burden of other variance components (here the residual).

Now, when you reach results that are so near the boundary of significance, it is important to start looking for more because any choice made could influence the results and you have no idea what choices the authors made. Even after receiving the codes, I have no idea why they chose one of the six models.

In the beginning, I obtained completely different results, because I used a different model. The analysis method I used was a Logistic Regression model with fixed study effects and the Hartung-Knapp (HK) adjustment which yielded an OR of 0.85 [0.70; 1.03].

My model before correspondence.

The HK adjustment is a well-known adjustment method in meta-analysis and has been adopted in papers highlighting good practice for the analysis of Odds Ratios although it is not without its own drawbacks. Nevertheless, considering that the statement that vaccination protects stillbirth sits on the edge of acceptable certainty, I included the correction from the start.

Applying it to all the methods used by the authors showed that only the Mantel-Haenszel (OR: 0.86 [0.77; 0.96]), Inverse (OR: 0.86 [0.77; 0.96]), and Peto model (OR: 0.86 [0.75; 0.98]) retained 95% confidence intervals that did not include one. Hence, the HK-adjustment would nullify the statistical significance of the GLMM model reported in the study (OR: 0.85 [0.70; 1.03]) (figure 1).

Even if I would agree with the authors that the HK adjustment is unnecessary as it performs best in situations where heterogeneity is high and sample sizes of studies are similar, I cannot ignore that each of the six models used has a prediction interval that exceeds one. With or without HK adjustment. This means that predicting the effectiveness of vaccination on stillbirth for a new cohort would exceed the threshold of the odds ratio. As a result, the findings are too uncertain to be communicated to future patients.

Then, there is the issue of the study designs included. Both cohort and case control studies have their own limitations (which may differ per study even), and on the ladder of the evidence-based pyramid they sit on a different rung. When I applied the logistic regression model with fixed effects, and applied sub-group analysis using study design, the positive effect of vaccinations on stillbirth disappears completely per design.

Logistic regression model I used before correspondence: subgroup-analysis by Study Design.

We notified the authors of these findings to which we received the reply that the study design stratified analysis is not entirely accurate which is partly due to inconsistency of the labeling in the dataset. In addition, the study of Theiler et al. is self-labeled as a case-control study but does not follow that label according to the authors of the review. Although I acknowledge that it can be difficult at times to extract all relevant information from a publication, and to disagree with the authors of the papers included, I do expect that information in the review to be usable for sub-group analysis. Study design is a viable form of sub-group analysis. Nevertheless, I wanted to redo my initial analysis using the GLMM model of the authors to see if my initial non-statistical finding remained (OR: 0.92 [0.69; 1.23]). It turned out that such a model cannot be fitted. This is most likely due to sample sizes too small for estimating a random component (2x cohort, 2x cohort study, and 1x case control study).

Could not perform sub-group analysis using the model of the authors.

Finally, the authors claim that it is the national level data of the UK, Israel and Canada that allowed the connection of vaccinations and still-birth to become statistically significant in the first place. To verify the findings, I applied the GLMM model of the authors, with or without HK-adjustment, and prediction intervals. Indeed, without HK adjustment, the three studies would yield a pooled random OR of 0.85 [0.73; 0.99] instead of 0.85 [0.61; 1.19]. Nevertheless, both studies yield a prediction interval of [0.32; 2.29] which is completely unacceptable for communication purposes.

In summary, the statement that vaccination protects stillbirth sits on the edge of acceptable certainty following the analytical methods used by the authors but disappears when applying adjustments. In addition, sub-group analyses are infeasible using the methods of the authors, and the prediction interval of all possible models used always exceeds one. This means that choices made influence confidence but not prediction intervals, and it very much remains to be seen if new studies will strengthen the current shaky evidence-base.

When boundary findings are susceptible to tipping by choices made, and unable to inform the next patient, it is perhaps best if such findings are purposefully downgraded to prohibit any grand claims on the protective effects of vaccinations. In fact, it gives false hope and that is not what evidence-based medicine should be about.

--

--

Dr. Marc Jacobs

Scientist. Builder of models, and enthousiast of statistics, research, epidemiology, probability, and simulations for 10+ years.