Predicting COVID-19 With Tax Returns
To explore more data on COVID-19, please go to covid19.topos.com
Income disparities and COVID-19
The relationship between income inequality and COVID-19 has been widely covered by various sources over the last 60 days. Findings show that the number of deaths and hospitalizations is much higher in low income neighborhoods, and cities with high levels of inequality.[1] Most of these studies rely on income data provided by the US Census, which is self-reported, often extrapolated based on relatively small samples (as in the ACS[2]) and fairly simple in the way income is measured (ie “Median Household Income”). A more nuanced and complete view of income can be gleaned from studying IRS income tax data. Tax data is only available publicly at the zip code level, but provides a highly detailed economic portrait of neighborhoods, particularly in the types of deductions that are claimed (dependents, capital gains, education credits, etc). And rather than being voluntarily self reported (as is the case with the Census/ACS), tax returns are mandated by law, with consequences for false reporting.
Thus while Census data can tell us how many people in a geography have salaries within specific intervals or what the median income of a neighborhood is, tax data can tell us how many people earned $25k — $50k, what proportion of those earnings were deductible (and for what reasons), and how much went to healthcare contributions. In this article, we study the relationship between fine-grained income tax data and COVID-19 cases at the zipcode level in NYC.
To focus our study, we started with tax metrics that generally indicate the top and bottom of the economic spectrum, looking at tax deductions that are only open to earners below a certain threshold (like the Child tax credit ), or income from financial instruments generally utilized by the wealthy (like Capital gains). The table below shows a selection of the correlation coefficients between cases per capita and income metrics in NYC.