Coronavirus model using R — Colombia

4 min readFeb 27, 2020

This morning I saw a post by Learning Machines where I learned to build a simple model for epidemics such as Coronavirus( 2019-nCoV). I decided to replicate this model for a smaller population — Colombia, using the data gathered by Rami Krispin. These are my findings.

Load libraries

library(ggplot2)
library(deSolve)
library(tidyverse)

Data

I downloaded the data <coronavirus> from RamiKrispin’s Github

Managing Data

In order to be able to use this data for our model, I added all the data points so that we had a number of total cases for each given day. I took out Feb-13 because it added more cases due to new identification techniques, and this may affect the way the model fits. I am yet to try to work out the model with this info, I will try it again someday.

c1 <- coronavirus %>%
  select(date,type,cases) %>%
  group_by(date,type) %>%
  summarise(total_cases = sum(cases))c2 <- c1[which(c1$type=="confirmed" & c1$date != "2020-02-13"   ),]c2$cumtotal_cases <- cumsum(c2$total_cases)ggplot(data= c2 ,aes(date,cumtotal_cases, colour = type)) + geom_line()

Data gathered from Rami Krispin’s GitHub

Model

This model is adapted from a Learning Machine’s post-

I set the population = 50,000,000 which is a good enough estimate for Colombia right now. Also, I created a vector Infected which takes in the info from the <coronavirus> data.

SIR models, the dynamics of the outbreak and uses three differential equations, in order to do so:

dS, dI, and dR, you can think of them as the rate of change for Susceptibles, Infected and Recovered for a given t.

Beta is the parameter that controls the transition between Susceptibles and Infected and Gamma controls the transition between Infected and Recovered:

pop <-50000000Infected <- as.integer(c2$cumtotal_cases)Dia <- 1:length(Infected)SIR <- function(time, state, parameters) {
  par <- as.list(c(state, parameters))
  with(par, {
    dS <- -beta/pop * I * S
    dI <- beta/pop * I * S - gamma * I
    dR <- gamma * I
    list(c(dS, dI, dR))
    })
}

Sum of Squared Differences

RSS takes the info given and minimizes the Sum of Squared Difference, between the number of infected -I(t)- and the corresponding number of predicted cases by our model I*(t)

RSS(β,γ)=∑ (I(t)−I*(t))²

init <- c(S = pop-Infected[1], I = Infected[1], R = 0)
RSS <- function(parameters) {
  names(parameters) <- c("beta", "gamma")
  out <- ode(y = init, times = Dia, func = SIR, parms = parameters)
  fit <- out[ , 3]
  sum((Infected - fit)^2)
}
 
Opt <- optim(c(0.5, 0.5), RSS, method = "L-BFGS-B", lower = c(0, 0), upper = c(1, 1)) # optimize with some sensible conditionsOpt_par <- setNames(Opt$par, c("beta", "gamma"))
Opt_par##      beta     gamma 
## 0.5770345 0.4229655t <- 1:100 # time in days
fit <- data.frame(ode(y = init, times = t, func = SIR, parms = Opt_par))

Interesting Stats

In the learning machine’s post R0 is calculated at 2.073224, we have a much lower value at 1.364, this R0 refers to the number of healthy people that get infected per number of infected people. I believe it is much lower because the data that I am using here shows a far slower expansion, also maybe the population difference might affect this indicator.

The height for the pandemic in our model is at day 64 with almost 2 million people infected and a maximum of 40,000 dead. Compared to Learning Machines height at day 50 with over 200 million infected and 4 million dead.

par(old)
 
R0 <- setNames(Opt_par["beta"] / Opt_par["gamma"], "R0")Height <- fit[fit$I == max(fit$I), "I", drop = FALSE] # height of pandemicMax_Dead <- max(fit$I) * 0.02 # max deaths with supposed 2% mortality rateR0 ## 1.364259Height ## 64 - 1966588Max_Dead ## 39331.75

Graphs

I like nice graphs, so the code is long, therefore I will just leave the first one.

graph11 <- fit %>% gather(key, value, -time)bigplot <- ggplot(graph11, mapping = aes(x = time, y = value, color = key) ) +   
  geom_line(size =1.25)+  
  scale_color_manual( values = 
    c("red1", "green3","gray1"))+ 
  
  theme(
    plot.title = element_text(size = 12, face = "bold",hjust = 0.5),
    plot.caption = element_text(size = 8, face = "italic"),
    legend.position="top",
    legend.title = element_blank(),
    legend.box = "horizontal" ,
    legend.text=element_text(size=8.5),
    panel.grid.minor = element_blank(), 
    panel.grid.major = element_line(color = "gray50", size = 0.5),
    panel.grid.major.x = element_blank(),
    panel.background = element_blank(),
    line = element_blank(),
    axis.ticks.length = unit(.15, "cm"),
    axis.ticks.y = element_blank(),
    axis.title.x = element_text(color="black", 
    size=12),
    axis.title.y = element_text(color="black",
    size=10,
    face="italic"))+
  
  scale_y_continuous(expand = c(0, 0),
    limits=c(0.0,50000000),
    breaks=seq(0.0,50000000,10000000), 
    name = "Number of subjects")+
  
  scale_x_continuous(expand = c(0, 0),
    name = "Days")+
  
  labs(title = "SIR Model 2019-nCov Colombia", 
       caption = "Info taken from RamiKrispin. Adapted model from Learning Machines.")bigplot

Final Thoughts

This model is very simple and there is a huge number of other variables that need to be considered, such as density, climate, government response … etc.

It was fun to build. Many thanks to RamiKrispin for the data and to Learning Machine (Holger Von Jouanne) for the tutorial.