# Coronavirus and the Fallacy of Increasing Sample Sizes, or Incidence Versus Prevalence

A growing total number of infected persons might seem like a sign of an epidemic, and it could be, but it could also just reflect the growing sample size.

For example, if a disease affects 10 percent of a uniformly distributed population, then a sample of 1,000 people might show around 100 people infected.

If the sample number of those being tested increased, then the number of infected persons in the sample is likely to increase at the same rate as the change in sample size.

From that same 10% infected population, a larger sample of 10,000 people would likely collect about 1,000 infected. A sample of 50,000 would collect 5,000 infected. So the number of infected increases in lock-step with the sample size. That applies whether the population sample size increases linearly or exponentially or any other way.

So an exponentially increasing number of infected persons can be a remnant of an exponentially increasing sample size. While it might seem like the numbers are increasing rapidly — high incidence — the prevalence — the overall rate of infection in the total population — might be static or even declining.

Instead of total infections, the incidence rate — the number of incidents/infections over the sample population size — might be a more useful measure to use in this situation.

And the incidence rate can also be seen as an average, as in the average number of people infected in a sample population. If that incidence rate hovers around a common amount for every sample, like 12% for the Coronavirus in the U.S., then by the Central limit Theorem the average of those averages will approximate the average for the whole population, namely 12%.

Comparing two samples, there could still be evidence of a growing rate of infection from a sample to a larger one, it would just have to be larger than the change in sample size. If the sample size grew exponentially, then the number of infections would need to grow more than exponentially to indicate a growing infection rate.

For example, below is a graph of a population with a constant infection rate (10%) but with an exponentially increasing sample size. The number of infections, in orange, simply parallels the sample size.

Here’s another example of sample size linearly increasing along with increasing infection rate. The total number of infections increase at a rate more than the sample size change.

Of course, rates of infection are never completely uniform, so changes in incident rates can be a result of sample bias from different demographics, high-risk groups, or any number of factors.

## Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

## Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Written by

## Llewellyn Jones

Publisher and data journalist at Investigative Economics https://www.investigativeeconomics.org ## Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com