Where You Live Affects What Your COVID-19 Test Means

Nir Yungster
11 min readApr 18, 2020

--

An important disclaimer before you read this: in this post, I discuss COVID-19 testing, with topics including false positives, false negatives, and the interpretation of testing results and their consequences. PLEASE be careful and consult your doctor before using the information to make medical decisions for yourself or others. This information is also not meant to contradict social policy or public health policy that may be in place, but to help people understand those policies more fully and help everyone navigate these times in an informed manner.

Math can be hard. When a colleague complained that a method for solving partial differential equations didn’t make sense, John von Neumann, one of the preeminent mathematicians of the 20th century, wrote back saying,

“in mathematics you don’t understand things. You just get used to them.”

Sometimes the issue isn’t so much that a question is difficult to comprehend, but that the answer runs completely counter to our intuition. Nowhere is this more the case than with the area of probability, a subject for which human intuition is notoriously poorly suited. Questions that are seemingly straightforward can have answers that are completely vexing. Take the birthday problem as an example:

Given a room with N randomly selected people, what is the chance that at least one pair of people in the room share a birthday?

Let me pose the problem slightly differently. Say I’m willing to bet you $20 that a room of people has at least one pair with shared birthdays. You certainly wouldn’t take that bet if there were 367 people in the room since it would be guaranteed that two people share a birthday. But would you take the bet if it was 100 people? 50 people? 25?

It turns out that 23 people is enough for a greater than 50% chance that at least one birthday pair exists in the room. What matters is not the number of people, but the number of distinct pairs of people that are implied — a room of 23 people has 253 distinct pairs (23 choose 2 combinations). I’d happily take my side of the bet with 23 or more people in the room.

If this seems surprising, I assure you that you are not alone. A professor of mine once posed this exact bet to a group of 50 engineering and science PhD students from Northwestern University in a probability seminar. Despite a whopping 97% chance that at least one pair of students in that room shared a birthday, several students happily took his $20 bet — and lost. The professor declined his winnings, though not without a big smirk.

The point is that probability can be highly counterintuitive to anyone. Unless you’ve seen this problem before or unless you possess a superhuman gift for combinatorics, it’s very easy to underestimate the question at hand and make a big mistake. This type of error isn’t just limited to scientific doctors though.

This brings me to the subject of probability in COVID-19 testing. Consider the following hypothetical scenario:

It’s a Saturday morning in downtown Cleveland, Ohio and Robin is heading to a drive-thru clinic to get tested for the COVID-19 virus. The city of Cleveland recently decided to test 5% of its inhabitants in order to understand the current prevalence of the virus, and Robin was randomly selected. Much to her surprise and dismay, and despite not displaying any symptoms, she tests positive. She’s instructed to stay home and isolate herself for the next 14 days.

Here’s where the probability question comes in: what is the chance that Robin actually has the COVID-19 virus?

First, we have to acknowledge that this is in fact a matter of probability. While there is plenty of evidence suggesting asymptomatic people can carry and spread COVID-19, we also have to confront the fact that like almost any test, COVID-19 tests are not infallible. They can make two types of errors.

  1. They can incorrectly indicate that a patient is healthy when they are in fact infected (a false negative).
  2. They can give a positive result for COVID when a patient is in fact completely healthy (a false positive).

A test with high specificity might have a false positive rate of only 1%, meaning that it correctly tells COVID-negative patients that they don’t have the virus 99% of the time. At the very least, we should agree that the chance of Robin having COVID is less than 100% because it is possible her result could have been a false positive.

Coming back to the question we posed then, what is the chance that Robin has COVID-19 given that she tested positive? In other words, what is the chance that her test was a TRUE positive and not a false positive?

One might be tempted to think: well, if the test has only a 1% false positive rate, there’s a 99% chance that Robin has COVID-19. This is in fact wrong. Very wrong, actually. Even assuming the test has zero false negatives, a 1% false positive rate would imply Robin has only about a 10% chance of having COVID. Let me repeat that: even though she tested positive, Robin would have only a 10% chance of having the virus.

(SPEED BUMP: before you run off and tell someone they probably don’t have COVID, please read on!)

This cognitive dissonance is sometimes referred to as the false positive paradox or the base rate fallacy, and it’s a common mistake to anyone not acquainted with Bayesian statistics. The situation arises when the incidence of a disease (also known as the base rate) is lower than or similar to the false positive rate of the test for that condition. As of April 17th, approximately 0.1% of Cleveland, Ohio residents are currently confirmed to be positive for COVID-19.

Map of COVID-19 cases as of April 17, 2020 (From New York Times, https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html)

To see how this gives rise to our ~10% estimate, let’s fully play out our hypothetical scenario.

  • The city of Cleveland randomly tests 5% of its population of 384,000 residents, which amounts to 19,000 people.
  • Assuming a 0.1% infection rate (i.e. 1 in 1000), we would expect 19 people of those tested to actually have COVID-19.
  • The test has a 1% false positive rate, which means that 1% of the non-infected population, about 190 people, would be expected to test positive for COVID-19 in error. (The math here goes like this: our non-infected population is 99.9% of 19,000 = 18,981, and 1% of that gives 189.8 false positives.)
  • That means that of all the people who test positive for COVID-19 in Cleveland through random testing, only 19 out of 209 actually would have COVID-19.
  • Robin, the subject of our hypothetical story, would thus only have a 19/209, or about a 9% chance of actually being COVID positive despite testing positive.

Suddenly, a 1% false positive rate doesn’t seem so great.

There are several factors that can change this calculus. First, we’re assuming above that our testing has no false negatives — in other words, that our test will catch all 19 people who actually have COVID. In fact, it is very likely that false negatives are occurring at a much higher rate than false positives, with some estimates suggesting rates of 30% or higher. Among the scenarios contributing to false negatives with swab tests are patients with low levels of the virus, errors in sampling or processing, and patients for whom the infection is not respiratory (e.g. GI-based infections).

The higher the false negative rate, the worse the base rate fallacy becomes. If we introduce a 30% false negative rate, we would only expect 13 people to test positive, and Robin’s chance of being infected would go down to 13/203 = 6.5%.

We can also be making a mistake in the opposite direction if our base rate assumption is too low. Suppose the 0.1% vastly underestimates the infection rate— almost certainly the case given that many people who are symptomatic have not been tested. If we assume that only 20% of all cases are being captured and that the true infection rate is 0.5% (and keeping a 30% false negative rate), Robin’s chance of having COVID rises to 26%.

(Note that all of the math we’ve done here is exactly what is captured by Bayes’ Theorem, the basis of Bayesian Statistics.)

An important point here is that all of this is predicated on the fact that there are no reasons to suspect that Robin has COVID compared to anyone else living in Cleveland. In other words, we’re assuming that the base rate of infection for Clevelanders is the best estimate we have about Robin prior to her being tested. In Bayesian statistics, we would refer to the base rate in Cleveland as the “prior probability” of the test. Of course, if Robin were exhibiting symptoms of COVID, or if she had been in contact with others being treated for COVID, that would give us reason to update our prior probability. Let’s instead imagine that given more symptoms and history that experts give a prior probability for Robin of 50% that she has COVID before conducting the swab test. Then, if a test returns a positive result, we would instead predict a 98%+ likelihood of her being infected (still using a 30% false negative rate). When the CDC was only recommending testing of patients with symptoms and histories consistent with COVID-19, they were in effect selecting for patients with high prior probabilities for COVID and in doing so reducing the chance that a positive test only gives minimal likelihood of infection (or worse, leads to misinterpretation).

Let’s think about some consequences of what we’ve discussed.

  • First, testing results must be interpreted through the lens of probability. While the thought may be unsettling, a test result cannot tell you if you have COVID or not; it can only update the probability that you have it.
  • When the false positive rate of a test is greater than the population infection rate, a positive test result for someone asymptomatic (and with no other relevant history) still implies that the patient is more than likely not infected.
  • Interpretation of test results — even when the test is identical — depends on the population being tested. Right now, a positive test result in New York City should be interpreted to be far more conclusive than a positive test result in Cleveland because the base rate of infection in New York is 15-20 times higher.

That last point is pretty mind-blowing if you think about it. Imagine that Robin has a twin sister in New York City, who like her, is asymptomatic and has no relevant COVID-19 history. In Cleveland, a positive result for Robin implied a 9% chance of being infected given a 0.1% base infection rate for Cleveland. Let’s redo the math with a 2% base rate for New York City.

  • Let’s assume the test still has 19,000 people. We would estimate our true positives will be 2% of 19,000 = 380 people
  • We would estimate our false positives to be 1% of our non-infected population, or 186 people. (We expect 98% of 19,000 to be non-infected, and 1% of that to be false positives).
  • The likelihood of a positive test being a true positive is then 380 / (380+186) = 67%

Identical tests in two different cities have completely different interpretations. This can be confusing not just for patients but for doctors and other medical staff as well. Consider the implication for a doctor or healthcare worker who changes location from New York to an area of the country with a much lower base rate. Having worked in and acclimated to the reality of a high infection rate, they may not be aware of the need to shift the interpretation of individual tests to match a new probabilistic reality.

Now, let me stress that positive tests should be treated with seriousness whenever and wherever they occur. The probabilistic interpretation of such tests, however, plays an important role in informing testing policy (alongside other issues such as the availability of testing supplies) when considering overall public health. In New York, testing patients with no suspicion of COVID may offer valuable new information, while in a place like Cleveland it may offer little information and not make sense as part of a standard protocol. This underscores the vital importance of state and local public health officials in guiding policy and informing citizens during this pandemic. Given the vast differences in interpretation of test results, it makes complete sense that New York City and Cleveland may have very different policies and protocols when it comes to testing.

While so far we’ve only discussed testing for COVID, the same exact principles apply to antibody testing as well. In that case, a test would address the question not of whether a person currently has COVID-19, but of whether they have developed antibodies for it after having been exposed. The relevant base rate for antibody prevalence would not be the proportion of the population that have COVID presently, but the fraction that previously had the virus and developed antibodies to it.

Proper interpretation of antibody test results is easily as important to public health as for COVID tests. The consequences for misinterpreting a positive antibody test are significant. An individual who thinks a positive test means 100% certainty of having antibodies may naively act as if they have immunity, not worrying about best practices like frequent hand-washing, not touching their face, and participating in social distancing. If that individual never had symptoms and lived in a city with a very low infection rate, they could easily have less than a 50% chance of having COVID antibodies even with a positive test result. Bottom line: antibody tests can also only offer a probability that you have antibodies.

So how can we square this with the calls for widespread antibody testing? Won’t such a policy lead to widespread false positives? The short answer is that such studies have to be done with a careful understanding of Bayesian statistics, using antibody tests that are well understood. In cases where a population with a very low base rate is being studied, researchers will have to use tests with extremely high specificity (i.e. low false positive rates) to ensure meaningful results.

A handful of studies have already started conducting antibody testing in populations at large to more accurately understand the base rate for COVID antibodies. Such studies provide vital information to epidemiologists modeling the spread of COVID (including for possible future waves of the illness), to experts making local policy recommendations, and to governments organizing response efforts. On top of that, understanding the prevalence of antibodies in any one given population will be vitally important for interpreting any one test result. Questions of probability can’t solely be left to expert statisticians — doctors and patients must inform themselves too.

On a personal note, please stay safe and follow the guidelines from your local health officials. If you test positive for COVID-19, take it seriously and listen to your doctors. If you test positive for an antibody, take it with a grain of salt.

Read part two of this post for a more visual explanation of the false positive paradox, and be sure to check out my interactive tool for understanding positive test results

--

--

Nir Yungster

Data Scientist • Occasional Writer • Cleveland Sports Fan