Time Will Tell — Blog 4
Surjit S Bhalla and Arvind Virmani
June 6, 2020
Abstract: What determines the magnitude of the spread of COVID19 within countries, and at any stage of the S-shaped cycle? Our exploration into various exogenous determinants leads us to three important conclusions — first, that higher temperatures mean lower infections; second, urban areas have a greater chance of infections than rural areas. This result lends support to social distancing as prevention hypothesis. Third, for infections per se, the share of elderly male population does not contribute to greater COVID19 infections.
Our next two blogs will examine the cross-country pattern of COVID19 deaths, and the contribution of policy measures towards infections and deaths.
Covid19 Day 150: What we know — and what we don’t know
The John Hopkins University data set on Covid19 starts the clock on January 22nd for four countries with their first observed corona case — Korea, Japan, Taiwan and the USA. However, China’s first case occurred sometime in November 2019 and in Thailand the first case was documented on Jan 14, 2020.
The world has gone through an incredible journey since then and even with Jan 1st as the start date, May 29th marked the 150th day of the crisis — a day we want to honor by discussing what we know, and what we don’t know; as of now, 154 days and counting
Spread of Virus
There are a few primary, and commonly assumed exogenous determinants, of SARS Corona Virus 2 (hereafter Covid19). Now that we have more than 150 days of data, and the presence of at least one case of the virus in 202 countries (and territories) there is a strong data foundation for establishing what are the exogenous factors which account for the observed variation in Covid19 across time and space.
This Time Will Tell Blog is about the diffusion of Covid19 cases; the next Blog will examine the determinants of the more important question of deaths attributed to the virus.
Weather/Temperature: One of the factors which was speculated early in the life and times of Covid19 was the effect of temperature on its diffusion. It was suggested that cold dry weather accelerated the spread of the disease. An alternative view drew on the experience of seasonality of influenza during winter months. Both explanations point to temperature being an important factor. In the first six countries with the virus (including the origin country China), four had temperatures much below 13 degrees C (mean population weighted world temperature in January), one country Taiwan was close to this temperature (15 C) and only one, Thailand, was well above 13 at 24 C.
This “coincidence” of early diffusion is meant to be suggestive. As we will soon observe, temperature is one of the very few strong, and consistent, determinants of diffusion. Please see our draft paper “Arrival and Departure — Part I.” (https://egrowfoundation.org/research/covid-19-arrival-and-departure/ ).
Old Age, and Men, and Virus Diffusion: Italian Covid19 cases and deaths exploded onto the world stage in March, and since then elderly/older males have been believed to be the worst victims of the virus. However, it is important to answer whether age influences virus diffusion or the incidence of death (with Covid19 as the cause), or both. To test this hypothesis, and using country level population data published by the UN, we have extracted the male population in each country over the age of 50, over the age of 60, 70 and 80. We test for the effect of elderly men (age>=50, age>=60, age>=70, and age>=80); broadly similar results are obtained, but age>=60 yields the highest explanatory power in regressions involving Covid cases.
Urbanization or population density: The spread of the Spanish Flu in England, a century ago, was attributed by a few studies to the slums in London. Casual inference based on within country data suggests that cases are largely concentrated in Metros and large cities, with noticeably fewer rural cases. Thus, the degree of urbanization in a country may be a factor. A parallel explanation is offered using a population density variable (population size divided by inhabitable area). However, what constitutes an inhabitable area is debatable. In any case, the degree of urbanization dominates population density in regressions and the latter is therefore not used in our analysis.
Diffusion and Cases: Covid19 has made “flattening of the curve” a household term. As we all know, the curve is a graphical representation of the spread of the virus, with the Y-axis representing the log of number of cases and the X axis the number of Covid19 cases in that country (or region, or state, etc.) The X axis is not the date as in most time-series economic analysis. For example, day 1 in any diffusion analysis can either be the date e.g. January 22nd, 2020, or it can be the first day when the virus was observed in each country. Which specification one uses can make a large difference to the interpretation of the results.
The reason this differentiation is important is because the number of observed cases on any date is affected by the number of days since the first case was observed (in that country). This is because historical evidence of diffusion of communicable diseases shows an S shaped pattern: A slow start followed by an acceleration, followed by a deceleration and finally a flattening out in the end. Both the initial speed of rise in cases, and the speed of flattening, are parameters which vary between countries. Most investigators have used the logistics curve to represent this pattern. We find that for COVID19, the Gompertz curve provides a better approximation of the elongated S than other logistic curves.
Capturing Cross-Sections: One indirect and admittedly imperfect way to approximate the determinants of a cross-section sample of countries (a snapshot) is to estimate separate regressions for countries classified according to number of days since the first observed case, hereafter daycvc. Note that this affects the sample of country observations at any point of calendar time. For example, for a sample of 167 countries (excluding small economies or territories or islands with a population less than 500,000), there were only 29 countries with a positive case on February 15, 2020. Counting from Jan. 22nd, this would be day 24 of the pandemic. Thus, if the models were estimated by date (say February 15th) there would only be 29 countries with diffusion data. But if the data are ordered by days since the first case was observed for each individual country, there are 167 countries (as of today) with at least 24 days of virus diffusion.
We therefore adopt a novel approach for estimating diffusion models. We estimate such models for daycvc equal to different days; Table 1 presents the results for daycvc equal to 40, 60, 80 and a 100 days. For day 100, there are 37 observations i.e. only 37 countries have had a virus case for a minimum of 100 days.
Elderly males and COVID infections
One important result that emerges is that the percentage of males in the population over the age of 60 is not significant in explaining the number of COVID cases, and this is irrespective of the definition (male population above 50, male population above 60, etc) and irrespective of the time elapsed (daycvc 40, 60, 80 or 100). These counter intuitive cases could be due to the possibility that greater care was being exercised by every country in protecting their aged population from catching the virus, after learning from the world-wide experience of fatality rates among the aged population, and especially of aged male population. As explained in our next blog, aged males are more likely to die from the virus; as was confirmed by the “vision” of the Italian experience.
Temperature: Among the three major explanatory variables, temperature is consistently the most significant and has the expected negative sign i.e. higher the temperature, less the diffusion. The coefficient is also remarkably stable in value — around -.08. Evaluated at the mean on day 90, each 1 degree increase in temperature led to 8243 fewer cases. (coefficient -.106, mean number of cases 77767). There is also some evidence that the importance of this variable has increased and strengthened over time. This could be due to the arrival of summer in the northern hemisphere and of winter in the southern hemisphere, slowing the spread of COVID cases in the former and accelerating it in the latter. One implication of this result is that countries which are traditionally affected strongly by the seasonal flu, may see a similar pattern for the COVID. The virus pattern has continually confounded experts — hence the emphasis on may.
The third important result is that urbanization is important in explaining the spread of the virus, but only after 40 days have passed. The significance of urbanization also increases with days of presence of the virus and the magnitude increases to 0.04 for daycvc equal to 80, from 0.032 for daycvc equal to 40. This result is supportive of the social distancing hypothesis.
Covid19 — What we know, and what we don’t know — and what can we conclude
There is no precedent for this rather unique pandemic. It has befuddled policy makers, economists, and epidemiologists in knots. Everybody who could be proved wrong, has been proved wrong, and we are all humbled by the uncertain uncertainty of COVID. Given this reality, we eschew the convention of summing up. Instead, in the belief that pictures convey complex phenomenon better than words or equations, we present three charts representing our results and some inferences.
Chart 1 presents results in an unconventional manner. Economists are used to seeing charts in which the distance of observation from the trend is the error i.e. if actual is above the trend line then the country is performing well. In the COVID case, actual number of cases being higher than predicted (“trend”) is performing badly i.e. the country had more cases than predicted.
Thus, the X-axis in Chart 1 reports the predicted (log) magnitude of the virus on day 80 in each country (results reported in Table 1). The Y-axis shows the (log) actual cases on day 80.
Chart 1: Actual vs. Predicted COVID cases on day 80
The chart represents the combined effect of all the three factors on prediction of cases. For example, for the USA, the error in prediction seems to be the highest (it is the largest positive distance away from the red line of equality between predicted and actual. Sri Lanka and Canada are opposite ends of the temperature and COVID spectrum, yet both are on the line. The countries below the line are the good performers — those farther away are the better performers e.g. Nepal.
The issue of which country has performed well, and why, deserves a more detailed examination and one outside the pay-grade(ambit) of this blog! One casual inference is that the countries below the line (the good performers) are doing well because the cases are being under-reported. For some countries this might very well be the case i.e. under-reporting is a huge factor. For other countries, there might be other explanations. We leave it to others, and (our future notes/blogs), to discover the true- explanation!