What is the chance of meeting a COVID-19 infected person in grocery stores?
Update 10/2/2020 The COSRE manuscript has been published on Health&Place
Update 5/20/2020 An animation of the community social risk trends of U.S.
Update 5/16/2020 The algorithm is changed based on the advice from Manó.
Update 4/15/2020 based on this article a simple web tool is developed to estimate the risks of COVID-19 in the communities of United States https://zihengsun.github.io/covid.html
This is probably the most concerned question right now in the middle of a pandemic. I am inspired to use birthday paradox to estimate the chance and the results successfully shocked me.
My wife is freaking out when she need go out to shop food and essentials in the last two weeks. She will first check out the parking lot of our regular Asian grocery store and if there are more than one-third spots are occupied, she will directly head back home. The fear of COVID-19 has made her in a very depressed mood. I tried to calm her down with some comforting words like “relax…take it easy..”which eventually backfired and make her feel I am not on her side and do not understand how dangerous the thing is.
Today when I am looking at two classic math problems: birthday paradox and gambler’s fallacy, one idea suddenly stroke me that the birthday paradox seems to be a perfect algorithm to estimate the chance of confronting another person with COVID-19 in grocery store. I developed the birthday paradox program and used it on today’s data of Fairfax County and the results shocked me and I cannot find any obvious wrong in the program. Now I am the same worried as the wife.
Reuse the idea of Birthday Paradox
Birthday paradox question has many similar places to the grocery problem here. The birthday paradox wants to know how many people are adequate to find two person sharing the same birthday (regardless of year). The algorithm finds that only 57 people could almost ensure that (99%). It is a little surprising as a year has 365 days which is much bigger than 57. Borrowing the idea, in the grocery problem, we need calculate the probability of the events that at least one of the people in the store has COVID-19. In other words, we need run lottery to pick a small group of people from the entire population and calculate the probability of the events when there is at least one infected people in the small group. It is difficult to calculate that directly, so we use the idea of birthday paradox to calculate the chance of the opposite (no pair has the same birthday/no infected people in the store) first and subtract it from 1 to get the chance of clash (at least one pair has the same birthday/at least one infected people in the store).
Here is how I did it. First, I reused the algorithm of birthday paradox and change the options from days (365) to population (total_population), and subtract the potential COVID-19 cases from the population to get the no-clash probability. Then minus the probability by 1 to get the chance of clash with COVID-19. Simple and realistic.
# Define a function to calculate the chance
def covid_clash(store_people_count, total_population, potential_covid_cases):
# calculate no clash probability
p = 1
for i in range(store_people_count):
p = p*((total_population-i-potential_covid_cases)/(total_population-i))
print("no clash : ", p)
# get clash probability
clashp = 1-p
# make it percentage
return round(clashp*100, 4)
Actually the probability calculated by this equation will have a similar distribution as the binomial distribution.
Now, it is ready to calculate the probability of the Fairfax county in Northern Virginia. The population of the county keeps increasing in recent years and the US Census estimates there are 1.1 million residents (1 July 2019). The University of Virginia COVID-19 dashboard shows 77 cases confirmed in the county (Mar 26 2020). Considering the virus has two week incubation period and the infection rate is around 2.5, it should be reasonable to say there are 1000 cases in this region right now (including the contracted people, potential people, self-resolved patients, people with no symptoms). Suppose there are 50 people in the grocery store when I go inside. These numbers will be used as inputs to the function above:
store_people = 50 # suppose there are 50 people including the staff in the store
county_population = 1100000 # Fairfax county has 1100000 people (estimate)
potential_covid_cases = 1000 #
print("The chance of COVID clash if going to a store with ", store_people, " people is ", covid_clash(store_people,county_population,potential_covid_cases), "%")
The result is:
no clash : 0.955542
The chance of COVID clash if going to a store with 50 people is 4.4458 %
The less than 5 percent chance seems pretty low. I tell myself it should be safe to shop around here. But another thought pops up since so many people are doubting about the real infected numbers and the scientists in CDC estimate there will be half million people infected. Let’s assume it in a pessimistic way by increasing the number of infected people to 10000. The new result is going to be:
no clash : 0.633412
The chance of COVID clash if going to a store with 50 people is 36.6588 %
Almost 40%. That is highly risky. In that case, if there is a person picking potatoes beside you, there is 40% chance the virus is on that person. Scary.
Besides grocery stores, hundreds line up outside Costco today. Let’s see the chance of clash if I go to Costco. The other variables stay the same except the people in the store:
store_people = 200 # Costco
county_population = 1100000
potential_covid_cases = 10000
print("The chance of COVID clash if going to Costco with ", store_people, " people is ", covid_clash(store_people,county_population,potential_covid_cases), "%")
The result is:
no clash : 0.16095
The chance of COVID clash if going to Costco with 200 people is 83.905 %
That is insane. 84%. I won’t go to Costco in next two weeks.
Let’s think about the people in the New York City. Today the confirmed cases reach 23112, and the total population is 8.4 million (1 July 2018). The function call will look like:
store_people = [50, 100, 200, 300, 400]
city_population = 8400000
covid_cases = 23112
for num in store_people:
print("The chance of COVID clash is ",
covid_clash(num,city_population, covid_cases), "% if going to a grocery store in New York City with ", num, " people inside.")
The results are:
The chance of COVID clash is 12.8693 % if going to a grocery store in New York City with 50 people inside.
The chance of COVID clash is 24.0826 % if going to a grocery store in New York City with 100 people inside.
The chance of COVID clash is 42.3656 % if going to a grocery store in New York City with 200 people inside.
The chance of COVID clash is 56.2457 % if going to a grocery store in New York City with 300 people inside.
The chance of COVID clash is 66.7832 % if going to a grocery store in New York City with 400 people inside.
The more people in the store, the higher the risk of getting infected. Precise foot traffic data of the stores could give us a more accurate risk estimation.
Italy is the country hit the hardest so far. A lot of people blame the Champions League match between Spanish club Valencia and Atalanta for it. That is the biggest soccer game in Atalanta’s history and a third of Bergamo’s population made the short trip to Milan’s San Siro Stadium (https://time.com/5809848/game-zero-soccer-game-italy/). About 40000 fans entered that stadium on that day. Let’s suppose back then there are only 1000 cases in the entire Italy. The population of the Bergamo city is around 120,287 (https://worldpopulationreview.com/countries/italy-population/). Let’s calculate the chance of getting infected in that game:
game_fans = 40000
bergamo_population = 120287
covid_cases = 1000
print("The chance of COVID clash is ",
covid_clash(game_fans,bergamo_population, covid_cases), "% if going to the Champions league game with ", game_fans, " fans.")
The result is:
The chance of COVID clash is 100.0 % if going to the Champions league game with 40000 fans.
This is what shocks me. It is almost 100% (round two decimal places) sure that you will be there with someone carrying the virus. Almost no chance to avoid that. It is possible that the game is the major reason for the later outbreak and making the Bergamo one of the epicenters in this pandemic.
U.S. Trends (update on May 20)
Using the COVID-19 data from John Hopkins University, population data from U.S. Census (2010), and the foot traffic data of stores from SafeGraph, risk daily maps are generated and composed into an animation GIF. It clearly shows that the risk in New York has significantly increased since March and also show some signs of easing in the last few days.
The safest way is to stay alone or in a bundled group with no contact of other people. If you go outside, the chance of meeting COVID-19 will increase along with the number of people you meet. Meanwhile, it also depends on the overall population and the overall COVID-19 cases in your region. Please feel free to use the introduced program to calculate your own chances in your cities/counties. Hope this program could give you an estimate every day and calm your worrying.