Anyone can publish on Medium per our Policies, but we don’t fact-check every story. For more info about the coronavirus, see cdc.gov.

Large Variability in Serology Estimates of Prevalence of SARS-CoV-2

Steve Yadlowsky
Apr 18 · 5 min read

Steve Yadlowsky
PhD Student, Stanford University

The recent analysis from Bendavid et al. claims that the sensitivity and specificity adjusted prevalence of antibodies to SARS-CoV-2 is 2.49% with 95% confidence intervals of 1.80–3.17%. However, these confidence intervals are calculated using an approximate formula that is not appropriate for the present analysis. We show that these confidence intervals are too narrow, and that we cannot confidently infer anything about how small the prevalence of SARS-CoV-2 may be from the results of the study. Therefore, we call for more extensive testing of the antibody test used in their study, and more careful analysis of statistical error, before concluding anything from the reported results. In statistical parlance, we show that under the null hypothesis of 0 prevalence of SARS-CoV-2 antibodies in the community, P = 0.0911 for the reported results. We show how to use the parametric bootstrap to get more accurate confidence intervals, however we cannot get results from these intervals for the demographics-adjusted prevalence due to the way the survey data is reported.

Knowing the prevalence of SARS-CoV-2 antibodies in the general population is important for epidemiological modeling and public health planning, as states and counties look for ways to safely ease the current widespread shelter-in-place orders, and we laud the authors of the recent serology test for working on this important problem. However, to be useful for these purposes, the estimated prevalence must be precise, with trustworthy error bars. Confidence intervals are a commonly used choice of error bars, because they show the range of parameters that are statistically consistent with the data. For the purposes of replicability, scientific and medical communities often report 95% confidence intervals, saying that if the study were repeated many times, the confidence interval should include the true result at least 95% of the time. The choice of 95% is not without controversy, but it is a commonly reported error bar that trades off the value of having precise estimates with likelihood of being wrong.

Sometimes, confidence intervals can be difficult to calculate, and the 95% confidence intervals reported in the study by Bendavid et al. use a standard statistical approximation that often gives accurate results. However, due to the small number of tests used to validate the accuracy of the serology tests, and the nonlinear nature of the analysis, this approximation is inaccurate for the discussed results. The originally reported 95% confidence intervals do not include 0, meaning that they report that if no one in the county had antibodies for SARS-CoV-2, the chance of getting their results are less than 5%. However, in contrast, we show that this probability is actually around 10.9%.

To show this, we must discuss a bit of statistics. The problem at hand is that the accuracy of the serology test itself is unknown. These tests can, for a variety of reasons, give a positive result (saying that someone does have the antibody) when in fact the person does not have the antibody; this is known as a false positive. If the fraction of negative subjects that get a positive test (known as the false positive rate, or FPR) is known, then the approximate number of people positive for this reason can be subtracted off to estimate the number of true cases. However, the FPR is never known exactly. The original results use the estimated FPR reported from the test manufacturer, who tested 371 samples known to be negative, and found that 2 of them tested positive. If a slightly different sample of tests was used, perhaps 3 would be positive, or perhaps 1 would. Luckily, statistics can tell us exactly how likely these are under different scenarios.

Under the null hypothesis of zero prevalence, we would expect the distribution of positive results in the community sample to be equivalent to those in the negative sample in the manufacturer’s test. Zero prevalence is a special case that simplifies the statistics, because it does not depend on the true positive rate, population weights, or clusters from testing multiple family members. We can therefore test the null hypothesis of 0 prevalence by testing whether there is a statistically significant difference between the fraction of positive results in the negative sample compared to the community sample. Using the Fisher exact test for the Binomial proportion, we find that the one-sided P-value for this hypothesis is 0.0911. The one-sided Pearson test gives a probability of 0.066, and the Yates corrected one-sided test gives 0.101. This contradicts the reported 95% confidence interval for the prevalence from applying the delta method and using asymptotic normality.

To get accurate confidence intervals, we recommend using the parametric bootstrap, which is a simulation methodology that takes the estimated parameters of the model and simulates new data from the model. Then, it uses the distributions of estimates from repeated simulation to estimate the confidence intervals. If the parametric model is accurate, this can provide accurate confidence intervals without requiring asymptotic normality. We cannot adjust for demographics using the reported data, because the number of positive tests in each demographic group was not reported. Therefore, we report confidence intervals for the prevalence among the population from which the survey sample was drawn (selected from Facebook Surveys). Using the parametric bootstrap, we get 95% confidence intervals of (0.00%, 1.82%). The code to reproduce these numbers is available here.

Related concerns and recommendations about the statistics in this preprint were shared here on Medium, and in this Twitter thread. Similar to this article, the commentaries discuss better approximations of the statistics of this study and are an interesting read. The parametric bootstrap approach that we advocate here advances these approaches by adjusting for more of the sources of uncertainty, although without adjusting for survey sampling biases or cluster variance. We believe adjusting for both of these should increase the width of the confidence intervals (higher variance), although it may change the center of the intervals (lower bias).

These confidence intervals are large, preventing inference of the prevalence of SARS-CoV-2 antibodies with confidence. To get more accurate confidence intervals, the false positive rate and true positive rate of the test should be more thoroughly investigated. Because the false positive rate can vary regionally, these validation studies should be performed locally using a large number of high quality samples. Using more accurate serological tests, or performing validation tests on positive individuals could also improve the precision of the study. Additionally, a larger survey sample should be performed. The simulation provided above could easily be adapted to give power estimates for different choices of validation sample and survey sample sizes.

Steve Yadlowsky

Written by

PhD student at Stanford University studying statistics, machine learning, and causal inference applied to medicine and clinical informatics.

Steve Yadlowsky

Written by

PhD student at Stanford University studying statistics, machine learning, and causal inference applied to medicine and clinical informatics.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store