Should We Randomly Test Everybody for Covid-19?

A Bayesian approach with real data may lead us towards a different direction

Andrea Cazzaro
Curated Newsletters
10 min readSep 24, 2020

--

Photo by Erik Mclean on Unsplash

Disclosure: I am not a health scientist. The information included in this article comes from an analysis that has not been peer-reviewed nor supervised by a health scientist. Please refer to more accurate and validated resources if you are looking for clear answers on Covid-19.

We all remember the WHO’s statement at the beginning of the pandemic: “Test, test, and test”. A clear strategy that many countries have adopted, including Italy. After China, Italy has been the first country to experience the rapid spread of Covid-19 but also the first one to recover from a portrayed scene of Dante’s Inferno.

Everything happened so fast that hospitals and labs were not ready to handle thousands of help requests. Hence, the Italian Health Organization decided to proceed with tests only for people with critical symptoms. This decision has been severely criticized by many doctors like Andrea Crisanti, who, while conducting research in a small community, discovered the danger of asymptomatic transmission (2020). According to his study, a good amount of people without symptoms tested positive for Coronavirus and possibly transmitted the disease to other fellows.

As time passed, labs became fully equipped to process several thousands of tests daily. As of today, September 18th 2020, Italy is processing more than 100.000 tests per day, testing symptomatic and asymptomatic citizens randomly. Now, more than ever, Italian citizens can get a free test in a few hours and know if they are positive or negative. Is this, however, a good strategy to fight the pandemic?

We should not forget that tests are not 100% accurate and we should take a Bayesian approach into consideration to understand whether massive testing is increasing the chance of isolating infected patients. Indeed, when determining a test result, we should take into account the probability that the person is infected. When very few people are infected, massive testing, without repeated testing, could become useless to fight the pandemic.

Dataset

For this analysis, we will consider the public dataset of the Italian Civil Protection (Protezione Civile). The dataset is updated daily and it contains information on tests, positive cases, intensive care beds, hospitalization rates, and other important figures since the 24th of February 2020.

Introduction to Bayes’ Theorem

According to Bayes’ Theorem, before determining the probability of an event (like a virus test), we need to take into consideration our prior knowledge on the conditions that might affect this event. In the case of random massive testing, we need to consider 3 important figures:

  1. The probability that the person has the virus
  2. The true positive rate (when a person who has the virus is tested positive), also known as the sensitivity of a test
  3. The true negative rate (when a person who does not have the virus is tested negative), also known as the specificity of a test

Why? Since many tests are not 100% reliable, the probability that a test is positive or negative is conditional to the reliability of the test, but also to the probability that the tested person has the virus. Indeed, the more are the people who are not affected by the virus, the higher the probability that the test will give a false positive.

Once we have these figures, the calculation becomes very easy. By following Bayes’ formula, we need to perform the following calculation to find the probability that a person who tested positive for the test is truly positive:

Probability that the person has the virus * True positive rate / {(Probability that the person has the virus * True positive rate) + (Probability that the person does not have the virus * False positive rate)}

and this formula to find the probability that a person who tested negative is truly negative:

Probability that the person does not have the virus * True negative rate / {(Probability that the person does not have the virus * True negative rate) + (Probability that the person has the virus * False negative rate)}

Let’s start the analysis.

The mystery of the infected

It is very hard to know how many people had Covid-19 in the first months of the pandemic. Some studies, conducted in different parts of the world, revealed that the number of infected people could be 10 times higher than the number of positive tests due to the impossibility of massive testing (New York Times, 2020). We will consider this hypothesis as true for our analysis and I will show later why, but first, we need to look at the real numbers, which have been collected by hospitals.

As you can see from the plot above, the blue and the green curves follow the same distribution and almost converge at the end, while the red curve looks flatter, without second peaks at the end. This means that, at the moment, many people are testing positive for Covid-19, but are not being hospitalized because they are either asymptomatic or because their symptoms are not severe.

Positive vs tested ratio

As the number of tests increased over time, we need to understand the percentage of how many people resulted positive when tested.

As we can see from the plot above, during the peak of the pandemic, almost 30% of tested people were resulting positive for Covid-19. Today, only 2% of tested people result positive. This means that today it is quite hard to find a positive test.

Finding our 3 figures

Now that we have a clear understanding of the numbers, we need to find our 3 main figures to apply Bayes’ Theorem. We have the number of positive tests, which we are going to multiply by 10, and we need the true positive rate and false positive rate. There are different opinions on these numbers, so we will consider Harvard’s guidelines (Shmerling, 2020) which propose a true positive rate of 95% and a true negative rate of 70–98%. For the second rate, we will use 70% to show that if there are many non-infected people, the probability of testing negative is not influenced by a higher specificity rate.

Probability of being positive when testing positive

By following Bayes’ Theorem, random massive testing with a low number of infected people can give very ambiguous results.

As we can see from the plot above, the probability of being positive after one positive test is significantly low. During the peak of the pandemic, the probability went up to around 25%, but we have considered a number of infected people 10 times higher than the number of people who tested positive. Today, the probability is around 15%. What happens if we consider the number of infected people as the number of people who tested positive (without 10x multiplication)?

The probability of being positive after one positive test is close to 0. The good news is that there is a solution to this problem and it is called repeated testing. By repeating the test 2 or 3 times and updating the formula with the new probability of being infected, we can reach a high level of certainty. A third positive test tells us that we are certainly positive with a probability of at least 80% (considering today’s numbers). Hence, random one-time testing may be completely useless without repeated testing because the confidence level of the test is too low.

Probability of being negative when testing negative

Now that you have seen the previous plots, what do you expect to see for the probability of being negative after a negative test? Yes, a very different outlook. As the probability of being non-infected is much higher than the probability of being infected, the resulting probability of being negative after a negative test is very high.

As you can see from the plot above, doing one, two or three tests does not change the probability of being a true negative.

How to verify the real number of infected people and decide how many repeated tests to perform

As mentioned before, studies have hypothesized that the real number of infected people may be 10 times higher than the number of positive tests due to the impossibility of massive testing. If we consider that many tests are performed just one time, we can try to assume the real number of infected people by creating the same Bayesian model and comparing the probability of being positive (when testing positive) to the positive/tested ratio.

The plot above was created with the same model we have used for the previous analyses. In this case, we have a sequence of 6.000.000 numbers because we would like to calculate the probability of being positive after testing positive assuming that infected people could be at most 6 million (10% of Italy’s population).

For example, if there are 1.000.000 infected people, the probability of being positive after one positive test is 0.2 or 20%. Now, if we go back to the positive/tested ratio’s plot, we can see that at the peak of the pandemic the percentage of people who tested positive was around 30%. By looking at the plot above, the probability of being positive after testing positive reaches 30% when there are at least 1.300.000 infected people. During the peak, the number of confirmed positive cases was around 80.000. Considering that many people could not be tested at the time so the number of positive cases could have been higher, we can assume that the real number of infected people may have been 10 times higher.

As we saw in the previous analyses, one random test is not enough to assure that a person is truly infected. However, we can see how a second test gives a very high certainty after a threshold of around 500.000 infected people, while a third test gives certainty even with a very small amount of infected people. In case you wonder how the probability of being negative after testing negative changes with the number of infected people, the plot below answers your doubts.

The danger of false negatives

Now, imagine for a moment that we are not calculating probabilities of being negative based on random testing, but on targeted testing. For example, a symptomatic patient gets tested by his doctor, who, based on the symptoms, sets the probability of being negative at 0.2 (0.8 probability of being positive).

As the conditional probability is set by the doctor and not by the probability of being negative based on the whole population, the probability of being negative after one test becomes at most 0.4. Is this enough? Considering that having a false negative person going around the streets is much worse than having a false positive person staying at home, one test is not enough. According to the model, three tests should be the most indicated measure when testing symptomatic patients.

Conclusion

Despite governments are turning their attention to random massive testing, this may not be the most efficient strategy to tackle the pandemic. As shown above, random one-time testing does not provide enough certainty to declare if a person is truly positive, causing a high rate of false positives. This issue may be prevented with targeted and repeated testing, which may be seen as a more efficient method to increase the conditional probability of being positive.

While the WHO’s recommendation is to “test, test, and test”, a better recommendation would be “test 1, test 2, and test 3”. Indeed, a third test, updated with the previous tests’ probability of being positive, confirms the presence of a positive case with high certainty. The opposite holds for the probability of being negative after one negative test. When the rate of infected patients is low, one negative test confirms the presence of a negative case with probability close to 1. However, when testing symptomatic patients, the conditional probability of being negative should be updated in order to avoid false negatives.

References

--

--

Andrea Cazzaro
Curated Newsletters

“Felix, qui potuit rerum cognoscere causas” (Virgil). My interests: economics, technology, computer and data science. My bio: https://bit.ly/37NxIBy.