Modeling the Covid-19 Curve and impact of Social Distancing — Part 1

Manasa Hariharan
8 min readApr 2, 2020

The last few weeks have been weird, to say the least. It feels like a realistic zombie novel come to life and while things seem to be moving dizzyingly fast in this global healthcare crisis, I find myself with more time on my hands than I ever asked for. And so, like anyone who loves numbers and hates being cooped up, I decided to take a look at the data and try to understand the Covid-19 Pandemic. Using daily data published by the Johns Hopkins University CSSE this is my attempt at understanding and explaining the math behind the Covid-19 disease curve and the need for social distancing.

The dataset is updated everyday with Confirmed Cases by Province/Country and Deaths from multiple official sources. We consider a simple epidemiological SEIR model to model the disease curve and try to understand the effect of Social Distancing/Self-Isolation/Lockdown on the growth rate and the size of the pandemic. All code used in this can be found at my github page.

Exploring the Dataset

By now, I am sure everyone has seen the usual “Flattening the Curve” data visualizations and the global heatmaps showing the Covid-19 hotspots and most affected regions. I’m leaving the links above if you haven’t and a much more simplified visual look at the dataset down below. As of the 29th of March, Over 199 countries have been affected by the respiratory virus. As seen below, the progression of the disease has been very different in each region depending on many factors including the early steps taken to contain it. In South Korea, quick and early measures were taken to get the outbreak under control and we see that the size of the outbreak is much lower compared to USA, where there has been confusion, delay and discord in the government’s response to the crisis. The quality of the data varies greatly depending on how widespread the testing is in that country, but we are going to overlook that for now and assume the data is an accurate reflection of ground reality in these countries.

Active cases = # of Confirmed cases -Deaths-Recovered

The Model — SEIR

The SEIR model is a widely used epidemiological model used to predict the size of an outbreak. It contends that a population of people are divided into 4 compartments in the duration of an outbreak, the Susceptible (people in risk of contracting the disease, but are currently healthy), the Exposed (people who have contracted the virus, but haven’t started showing symptoms yet, and hence aren’t infectious themselves), the Infectious (the people we know as confirmed cases of Covid -19, people who are exhibiting symptoms and hence can spread the disease to others near them) and the Recovered/Dead (people who have recovered from the disease or are dead and hence not infectious or under risk of contracting the disease).

In the beginning, almost everyone in the population is in the Susceptible compartment with a very few number in the Exposed and the Infectious category, and 0 people would have Recovered from the disease. As the disease spreads, more and more people would move into the Exposed and then the Infectious categories of the model, and finally into the Recovered category. The rates at which people move into each state of the model is described below.

SEIR model equations

Here S, E, I and R stand for the 4 compartments in the model, β is the transmission rate per infectious individual i.e average number of people an infectious person comes into contact with, σ is the inverse of the mean incubation period of the disease i.e the amount of time it takes for symptoms to show after contracting disease (For Covid — 19, it is estimated to be around 5.1 days, hence σ = 0.1961), γ is the recovery rate which is inverse of the mean duration for which symptoms are exhibited and person is infectious.

β and γ are not fixed for a given disease and vary greatly depending on the demographics and the ground policies. In our analysis, we will be estimating β and γ using the data we have from different countries so far. We will fix the σ as it is intrinsic to the virus regardless of location.

This model is highly effective and simple but makes multiple assumptions including:

  1. People who recover from Covid-19 have lifelong immunity to the virus, like measles and unlike the seasonal flu. This is information we do not have yet, but research seem to agree that there is a good chance we will be immune at least for a while after recovering.
  2. This version of the model assumes that the total population is going to remain the same throughout the duration of the outbreak, this assumption holds true if the duration is short enough and the difference between births and deaths are insignificant. However, for persistent or deadly outbreaks, that might not be the case and rates of births and deaths will need to be incorporated into the model.
  3. People are not infectious during incubation period. This hasn’t been verified yet and scientists believe people with little to no symptoms could still possibly spread the disease.
  4. We aren’t looking into the effect of imported cases, ie people who fly into the region . This is super important because in most countries, even after the first few cases were detected, people carrying the virus flew into the region, hence the rate of spread is higher than a region with no imported cases.
  5. Lastly, this model obviously doesn’t capture the nuance of how a population behaves. It assumes everyone interacts with everyone else at the same rates,however, most of us interact only with a close circle of people everyday. And the β transmission rate changes as the infection progresses and measures like social distancing are introduced, but the model only captures an overall average rate and doesn’t account for reactive measures.

Fitting the model and

All code for this analysis is written in R. The deSolve Package was used to solve the equations written above. The Optim function in R was used to minimize RMSE and estimate the parameters β and γ and the total active cases by day was used as the fitting data to calculate RMSE . The population was changed to the respective country’s population as well.

Before we go into the actual modeling process I should issue the disclaimer: I’m not an epidemiologist and these are not supposed to be useful for any real world predictions. This is a very simplified version of actual models epidemiologists use and is way to learn the basics and interact with data related to something that is affecting us all.

Some of the issues to look out for when fitting:

  1. Optim is not perfect for this problem. This is still early stages and the infection curves for a lot different R0s look very similar in the initial stages, so currently a lot of different sets of parameters could give us very low RMSEs and it is too dependent on the initial values provided. This is something I hope to fix with more work in a future post.
  2. Make sure to reduce tolerance for optim(there are many ways to do so, we can change reltol, abstol or parscale in the function options) as the optimization tends to end early.

Using Italy’s data till 31st March and Optimization to estimate the best parameters, we get the parameters β = 1.42 and γ = 0.6. We plug these parameters back into the SEIR model to simulate an infection curve. the results are shown below and the actual curve so far is overlaid as well. The results are frightening. On the worst day, the model shows that over 5% of the population of Italy could be affected. That’s over 3 million people affected on the same day and up to 60000 dead (assuming a 2.3% case fatality rate, although really what we should be using is the infection fatality rate, which is unknown)

However the good news is that it’s not going to be this bad. The effects of social distancing haven’t been included in this model (which would decrease our β) and with faster tests emerging, infectious people are identified faster and isolated faster (which increases γ ), our effective reproductive number will drop soon and the actual affected would hopefully be much lower than our dire prediction.

Running the same optimization process with the same set of starting values for 4 different countries, we get the following result:

The difference in the curves is the difference that early containment policies and social distancing measures have on the spread of the virus. Spain seems to have it worse because of the smaller size of its population and the alarming increase in its number of cases. USA is much more hit than all these countries, but I have not included their data since the data we have only shows cases to have increased after March 1 and the infection was probably spreading for much longer undetected due to slow and inconclusive tests. This would bias our model and give inaccurate results.

Effect of Social Distancing

Quantifying the effect of Social Distancing could be done in many ways and I would like to try more complicated and effective methods in a later post. For now we are now going to look at a fairly simple calculation. Since social distancing would directly effect β, we add a parameters ρ that shows the level of social distancing. ρ would vary between 0 to 1, 1 indicating no social distancing and business as usual and 0 indicating total isolation and no spread. This parameters is applied to the curve from day 1, which isn’t practical because most social distancing is reactive. There is usually significant lag between noticing an outbreak and ramping up social distancing procedures. It is however useful to visualize the difference it would make even if we cut down interactions by a small percentage. Below is the simulation for Italy, with ρ of 1, 0.8 and 0.6. We can see the amount of difference even a 20% decrease in interactions makes to our infection curve

For my next stage, I am hoping to look into better fit my model using a Monte Carlo estimation (to introduce noise and increase confidence in our estimates) and more scientific use of starting values. I am also hoping to try a Hidden Markov Model to fit my data (not sure if this would work due to scarcity of data, let me know any tips!). Hope this was useful/interesting. You can find the code used for this analysis up here. Stay safe, stay sane and stay indoors.

--

--