Coronavirus model using R — Colombia
This morning I saw a post by Learning Machines where I learned to build a simple model for epidemics such as Coronavirus( 2019-nCoV). I decided to replicate this model for a smaller population — Colombia, using the data gathered by Rami Krispin. These are my findings.
Load libraries
library(ggplot2)
library(deSolve)
library(tidyverse)
Data
I downloaded the data <coronavirus> from RamiKrispin’s Github
Managing Data
In order to be able to use this data for our model, I added all the data points so that we had a number of total cases for each given day. I took out Feb-13 because it added more cases due to new identification techniques, and this may affect the way the model fits. I am yet to try to work out the model with this info, I will try it again someday.
c1 <- coronavirus %>%
select(date,type,cases) %>%
group_by(date,type) %>%
summarise(total_cases = sum(cases))c2 <- c1[which(c1$type=="confirmed" & c1$date != "2020-02-13" ),]c2$cumtotal_cases <- cumsum(c2$total_cases)ggplot(data= c2 ,aes(date,cumtotal_cases, colour = type)) + geom_line()
Model
This model is adapted from a Learning Machine’s post-
I set the population = 50,000,000 which is a good enough estimate for Colombia right now. Also, I created a vector Infected which takes in the info from the <coronavirus> data.
SIR models, the dynamics of the outbreak and uses three differential equations, in order to do so:
dS, dI, and dR, you can think of them as the rate of change for Susceptibles, Infected and Recovered for a given t.
Beta is the parameter that controls the transition between Susceptibles and Infected and Gamma controls the transition between Infected and Recovered:
pop <-50000000Infected <- as.integer(c2$cumtotal_cases)Dia <- 1:length(Infected)SIR <- function(time, state, parameters) {
par <- as.list(c(state, parameters))
with(par, {
dS <- -beta/pop * I * S
dI <- beta/pop * I * S - gamma * I
dR <- gamma * I
list(c(dS, dI, dR))
})
}
Sum of Squared Differences
RSS takes the info given and minimizes the Sum of Squared Difference, between the number of infected -I(t)- and the corresponding number of predicted cases by our model I*(t)
RSS(β,γ)=∑ (I(t)−I*(t))²
init <- c(S = pop-Infected[1], I = Infected[1], R = 0)
RSS <- function(parameters) {
names(parameters) <- c("beta", "gamma")
out <- ode(y = init, times = Dia, func = SIR, parms = parameters)
fit <- out[ , 3]
sum((Infected - fit)^2)
}
Opt <- optim(c(0.5, 0.5), RSS, method = "L-BFGS-B", lower = c(0, 0), upper = c(1, 1)) # optimize with some sensible conditionsOpt_par <- setNames(Opt$par, c("beta", "gamma"))
Opt_par## beta gamma
## 0.5770345 0.4229655t <- 1:100 # time in days
fit <- data.frame(ode(y = init, times = t, func = SIR, parms = Opt_par))
Interesting Stats
In the learning machine’s post R0 is calculated at 2.073224, we have a much lower value at 1.364, this R0 refers to the number of healthy people that get infected per number of infected people. I believe it is much lower because the data that I am using here shows a far slower expansion, also maybe the population difference might affect this indicator.
The height for the pandemic in our model is at day 64 with almost 2 million people infected and a maximum of 40,000 dead. Compared to Learning Machines height at day 50 with over 200 million infected and 4 million dead.
par(old)
R0 <- setNames(Opt_par["beta"] / Opt_par["gamma"], "R0")Height <- fit[fit$I == max(fit$I), "I", drop = FALSE] # height of pandemicMax_Dead <- max(fit$I) * 0.02 # max deaths with supposed 2% mortality rateR0 ## 1.364259Height ## 64 - 1966588Max_Dead ## 39331.75
Graphs
I like nice graphs, so the code is long, therefore I will just leave the first one.
graph11 <- fit %>% gather(key, value, -time)bigplot <- ggplot(graph11, mapping = aes(x = time, y = value, color = key) ) +
geom_line(size =1.25)+
scale_color_manual( values =
c("red1", "green3","gray1"))+
theme(
plot.title = element_text(size = 12, face = "bold",hjust = 0.5),
plot.caption = element_text(size = 8, face = "italic"),
legend.position="top",
legend.title = element_blank(),
legend.box = "horizontal" ,
legend.text=element_text(size=8.5),
panel.grid.minor = element_blank(),
panel.grid.major = element_line(color = "gray50", size = 0.5),
panel.grid.major.x = element_blank(),
panel.background = element_blank(),
line = element_blank(),
axis.ticks.length = unit(.15, "cm"),
axis.ticks.y = element_blank(),
axis.title.x = element_text(color="black",
size=12),
axis.title.y = element_text(color="black",
size=10,
face="italic"))+
scale_y_continuous(expand = c(0, 0),
limits=c(0.0,50000000),
breaks=seq(0.0,50000000,10000000),
name = "Number of subjects")+
scale_x_continuous(expand = c(0, 0),
name = "Days")+
labs(title = "SIR Model 2019-nCov Colombia",
caption = "Info taken from RamiKrispin. Adapted model from Learning Machines.")bigplot
Final Thoughts
This model is very simple and there is a huge number of other variables that need to be considered, such as density, climate, government response … etc.
It was fun to build. Many thanks to RamiKrispin for the data and to Learning Machine (Holger Von Jouanne) for the tutorial.