Ivermectin, Covid-19 and death
Bayesian meta-analysis on published data using binomial, negative binomial, beta, and zero-inflated beta distributions.
Ivermectin as a possible pre-treatment for Covid-19 has received a lot of attention. Both in the number of studies conducted as well as the backlash it received from (mostly) the medical community. This post is NOT about showing that ivermectin does or does not work, but about ways to analyze the data. Which, by the way, are extracted from this living systematic review. As you can see on the site, there is quite some discussion going on about the validity of the studies and the meta-analysis itself which is fine. It is science.
In short, I took a download of the data and focused on mortality alone. Pre-selection was done on those studies that labeled the dose administered and the number of events in both groups, clearly. The sole focus on mortality is because that metric is more robust to the multitudes of bias that can creep into a study. A possible, but heavy limitation, of almost all studies included is that ivermectin is almost never given as the sole medication. Nevertheless, this post is not for changing clinical policy, but about analyzing datasets like these. As you will see, they can be quite challenging, despite the freedom to tackle them in multiple ways.
So, lets get started. This will be quite the post, so I will add the codes in between.
library(ggplot2)
library(dplyr)
library(brms) # Bayesian modeling through Stan
library(tidybayes) # Manipulate Stan objects in a tidy way
library(broom) # Convert model objects to data frames
library(broom.mixed) # Convert brms model objects to data frames
library(vdemdata) # Use data from the Varieties of Democracy (V-Dem) project
library(betareg) # Run beta regression models
library(extraDistr) # Use extra distributions like dprop()
library(ggdist) # Special geoms for posterior distributions
library(gghalves) # Special half geoms
library(ggbeeswarm) # Special distribution-shaped point jittering
library(ggrepel) # Automatically position labels
library(patchwork) # Combine ggplot objects
library(marginaleffects) # Calculate marginal effects for frequentist models
library(emmeans) # Calculate marginal effects in even fancier ways
library(modelsummary) # Create side-by-side regression tables
library(RColorBrewer)
library(modelr)
library(reshape2)
library(grid)
library(gridExtra)
library(lme4)
library(sjPlot)
library(sjmisc)
library(metafor)
library(meta)
library(dmetar)
The data I took are from the https://ivmmeta.com/ website. Since the data cannot be downloaded, I recreated the data in an Excel file, gthe results of which you can see below. Unfortunately, I could not extract all the data.
Lets take a look at what I fetched and transform each study into an odds ration.
df <- read_excel("Data.xlsx")
View(df)
skimr::skim(df)
df%>%print(n=50)df%>%
mutate(OR = (Treatment_cases/Treatment_N) / (Control_cases/Control_N))%>%
ggplot(., aes(y=Study, x=OR, colour=Setting))+
geom_point()+
geom_vline(xintercept=1, col="black", lty=2)+
theme_bw()
Now, the easiest way to analyze this dataset would be to use formal meta-analysis in a frequentist way. I have written several times on how a meta-analysis is actually nothing more than a mixed model. You will see the same happening here. Using the meta package, I will let the built-in functions calculate the odds ratio of each study and transform them into a summary estimate using a generalized linear mixed model. From that model you can do a lot of additional tests, like testing for publication bias, influence analysis, sub-group and meta-regression.
dfOR_meta<-dfOR%>%
dplyr::select(Study,
Treatment_cases,
Treatment_N,
Control_cases,
Control_N,
Outcome)%>%
dplyr::filter(Outcome=="death")%>%
tidyr::drop_na()meta1<-metabin(Treatment_cases,
Treatment_N,
Control_cases,
Control_N,
studlab=Study,
tau.common=FALSE,
data = dfOR_meta,
method = "GLMM",
model.glmm = "CM.AL",
sm="OR",
MH.exact = TRUE,
fixed = FALSE,
random = TRUE,
method.tau = "ML",
prediction = TRUE,
haln=TRUE)
meta1
summary(meta1)
forest.meta(meta1, layout = "RevMan5")
exp(meta1$TE.random)
meta1$tau2
find.outliers(meta1)
def.inf <- InfluenceAnalysis(meta1, random = TRUE)
plot(def.inf, "baujat")
plot(def.inf, "influence")
plot(def.inf, "es")
plot(def.inf, "i2")
drapery(meta1,
labels ="studlab",
type = "pval",
legend = FALSE)
funnel(meta1)
metabias(meta1,
method.bias = 'linreg',
k.min = 5,
plotit = T)
The above result is best portrayed using a forest plot. Here you can see the odds ratio for each study which is the data that drives the results. Also, you can see the summary estimates and the prediction intervals. In a previous blog post, I already mentioned the implications of reporting a significant confidence interval when there is a much wider prediction interval underneath. One should look at both, and if the prediction interval is really that wide the confidence interval does little to communicate confidence in for any form of clinical application. It already seems that this will be the case here as well.
Now, when you have so many studies, it is good to take a look at how they influence results and what their individual contribution is. As you can see below, I have plotted the effect size changes per included study. Actually, what it shows is how the effect sizes change if you omit a study.
It becomes immediately clear that the majority of the studies are really small, except the bottom two. If we go back to the previous forest plot, you can see that those studies have indeed a much larger sample size but also many more events. To me, it was not clear how to actually interpret these results as showcased on the website, so I eventually took them out. There is a risk in that, having now only small studies, but at least ones that I can understand.
Based on the above results, I decide to delete three studies. For one, I did not find their presentation on the website to be very clear, showing extreme numbers of death in the control group. It could very well be my like of understanding, but I still wanted to see what happens if delete those studies.
dfOR_meta2<-dfOR%>%
dplyr::select(Study,
Treatment_cases,
Treatment_N,
Control_cases,
Control_N,
Outcome)%>%
dplyr::filter(Outcome=="death")%>%
dplyr::filter(!Study %in% c('Efimenko',
'Mayer',
'Baguma'))%>%
tidyr::drop_na()meta2<-metabin(Treatment_cases,
Treatment_N,
Control_cases,
Control_N,
studlab=Study,
tau.common=FALSE,
data = dfOR_meta2,
method = "GLMM",
model.glmm = "CM.AL",
sm="OR",
MH.exact = TRUE,
fixed = FALSE,
random = TRUE,
method.tau = "ML",
prediction = TRUE,
haln=TRUE)
forest.meta(meta2, layout = "RevMan5")
Up until now, we only looked at the studies from a global perspective, not taking into account further context such as Setting or Dose. Lets go for a sub-group analysis, and look closer at Setting.
dfOR_meta2_subgroup<-dfOR%>%
dplyr::select(Study,
Treatment_cases,
Treatment_N,
Control_cases,
Control_N,
Outcome,
Setting)%>%
dplyr::filter(Outcome=="death")%>%
dplyr::filter(!Study %in% c('Efimenko',
'Mayer',
'Baguma'))%>%
tidyr::drop_na()
meta2_sub<-metabin(Treatment_cases,
Treatment_N,
Control_cases,
Control_N,
studlab=Study,
subgroup=Setting,
tau.common=FALSE,
data = dfOR_meta2_subgroup,
method = "GLMM",
model.glmm = "CM.AL",
sm="OR",
MH.exact = TRUE,
fixed = FALSE,
random = TRUE,
method.tau = "ML",
prediction = TRUE,
haln=TRUE)
meta2_sub
summary(meta2_sub)
forest.meta(meta2_sub, layout = "RevMan5")
exp(meta1$TE.random)
meta1$tau2
find.outliers(meta1)
def.inf <- InfluenceAnalysis(meta1, random = TRUE)
plot(def.inf, "baujat")
plot(def.inf, "influence")
plot(def.inf, "es")
plot(def.inf, "i2")
drapery(meta1,
labels ="studlab",
type = "pval",
legend = FALSE)
funnel(meta1)
metabias(meta1,
method.bias = 'linreg',
k.min = 5,
plotit = T)
The sub-group analysis shows no effect for the early setting, a slight positive effect in the late setting, and a positive effect at prophylaxis. The test for sub-group differences is non-significant, but I do not care about that. Clinically, it makes all the difference in which setting you apply the experimental treatment, so the sub-group analysis needs to be made.
However, the last two settings exhibit severe heterogeneity which limits usability of the data. I even have to question if this level of sub-group division is sufficient. Probabilities are conditionally dependent and can react heavily by the in-or exclusion of a variable, or a study. Also, even if not depicted, it does not take very long to decipher that the prediction interval for the late setting will also exceed one, meaning that Ivermectin data cannot be used to individually predict who will benefit from it.
Time to take it up a notch and do so more modelling. We have not touched Dose yet, and I want to have a little bit more freedom in the analysis itself.
So, like I said, the meta-analysis is a generalized linear mixed model which we can and may approach from a different perspective using different libraries that are available to us in R. Lets see what happens if I use the glmer function of the lme4 package and look at the number of events within each sample. For now, I just want to take a look at the variation. To do this, I do need to change the way the data is build up.
Please already note that I am preparing the data for the Beta distribution by calculating the ratio. Since the Beta does not like 0 or 1, I will add or subtract a very small number.
dfmelt<-df%>%
dplyr::select(-Outcome)%>%
melt(., id.vars=c("Study", "Setting", "Dose"))%>%
tidyr::separate(variable, c("Treatment", "N"))%>%
dplyr::arrange(Study)%>%
tidyr::spread(N, value)%>%
dplyr::filter(!Study %in% c('Efimenko',
'Mayer',
'Baguma'))
dfmelt$ratio<-dfmelt$cases/dfmelt$N
dfmelt2<-dfmelt%>%
dplyr::mutate(ratio = ifelse(ratio == 0, 0.001, ratio),
Dose = ifelse(Treatment == "Control", 0, Dose))%>%
dplyr::mutate(Treatment = forcats::fct_recode(Treatment,
"Ivermectine" = "Treatment"))
dim(dfmelt2)
dfmelt2%>%print(n=90, na.print="")fit_random_int<-glmer(cbind(cases,N)~1 + (1|Study),
data=dfmelt2,
family=binomial(link="logit"))
summary(fit_random_int)
plot(fit_random_int)
plot_model(fit_random_int, type="re")
coef(fit_random_int)
fixef(fit_random_int)
exp(fixef(fit_random_int))
ranef(fit_random_int)
fixef(fit_random_int)[[1]]+ranef(fit_random_int)[[1]]
So, lets add Treatment.
fit_random_int_b<-glmer(cbind(cases,N)~Treatment + (1|Study),
data=dfmelt2,
family=binomial(link="logit"))
anova(fit_random_int, fit_random_int_b)
plot_model(fit_random_int_b, type="diag")
plot_model(fit_random_int_b, type="re")
One thing I did not really do is look at the raw data, and graph it. Sure, we had the first plot showing the odds ratios per study but that is not enough. I want to look at the data in a more raw form, because it could very well be that the binomial distribution is not the best one to choose. You never really know up front, and some might argue it does not even matter, but lets not take that route.
If you look at the raw data, you can see that a lot of times there is no death in one of the groups. That is good, it means Covid-19 was not lethal, but it also means that we could be dealing with zero-inflated models and we need to approach those differently.
ggplot(dfmelt2, aes(x = ratio, fill="All data")) +
geom_density(alpha = 0.6) +
scale_fill_viridis_d(option = "plasma", end = 0.8) +
labs(x = "Proportion of deaths", y = "Density") +
ggthemes::theme_clean() +
theme(legend.position = "bottom")g1<-ggplot(dfmelt2, aes(x = Treatment, y = ratio)) +
geom_half_point(aes(color = Treatment),
transformation = position_quasirandom(width =0.1),
side = "l", size = 0.5, alpha = 0.5) +
geom_half_boxplot(aes(fill = Treatment), side = "r") +
scale_fill_viridis_d(option = "plasma", end = 0.8) +
scale_color_viridis_d(option = "plasma", end = 0.8) +
guides(color = "none", fill = "none") +
labs(x = "Treatment", y = "Proportion of deaths") +
ggthemes::theme_clean()
g2<-ggplot(dfmelt2, aes(x = ratio, fill = Treatment)) +
geom_density(alpha = 0.6) +
scale_fill_viridis_d(option = "plasma", end = 0.8) +
labs(x = "Proportion of deaths", y = "Density", fill = "Treatment") +
ggthemes::theme_clean() +
theme(legend.position = "bottom")
grid.arrange(g1,g2,ncol=2)
ggplot(dfmelt2, aes(x = ratio, fill = Treatment)) +
geom_density(alpha = 0.6) +
scale_fill_viridis_d(option = "plasma", end = 0.8) +
labs(x = "Proportion of deaths", y = "Density", fill = "Treatment")+
ggthemes::theme_clean() +
facet_wrap(~Setting, scales="free")+
theme(legend.position = "bottom")
Below, you can see me going through several different models. Each of them uses the binomial distribution which is probably not the best distribution to use. For me, it was about seeing what would happen if I would add/delete the fixed and/or random effects.
fit0<-glm(cbind(cases,N)~Treatment+Dose+Setting,
data=dfmelt2,
family=binomial())
fit<-glmer(cbind(cases,N)~Treatment+Dose+Setting +
(1|Study),
data=dfmelt2,
family=binomial())
fit2<-glmer(cbind(cases,N)~Treatment+Dose+
(1|Study),
data=dfmelt2,
family=binomial())
fit3<-glmer(cbind(cases,N)~Treatment*Dose+
(1|Study),
data=dfmelt2,
family=binomial())
fit4<-glmer(cbind(cases,N)~1+(1|Study),
data=dfmelt2,
family=binomial())
fit5<-glmer(cbind(cases,N)~Treatment+(1|Study),
data=dfmelt2,
family=binomial())
AIC(fit0, fit, fit2, fit4, fit5)
anova(fit, fit2)
anova(fit4, fit5)
Time to go Bayesian! I have made several posts in the past month showcasing the Bayesian way of analysis, and I will not go into Bayesian theory. For that, you need to take a look at my blog list or read up on many of the other great blogs available on the web. No, what is important here is how we apply Bayesian analysis on this dataset because, like I already mentioned, Ivermectin is a treatment that is not without controversy. But, before I move to the prior, it is time to hunt for a better distribution.
Lets start with the same analysis as the one I applied using glmer. My favorite Bayesian package is brms.
get_prior(cases | trials(N)~Treatment+Dose+Setting + (1|Study),
data=dfmelt2,
family=binomial(link="logit"))
bayes_binom<-brm(
cases | trials(N)~Treatment+Dose+Setting + (1|Study),
data=dfmelt2,
family=binomial(link="logit"),
warmup = 1000,
iter = 4000,
chains = 4,
cores=6,
seed = 123)
summary(bayes_binom)
plot(bayes_binom)
pp_check(bayes_binom, ndraws=100)
prior_summary(bayes_binom)
pp_check(bayes_binom, type = "error_hist", ndraws = 11)
pp_check(bayes_binom, type = "scatter_avg", ndraws = 100)
pp_check(bayes_binom, type = "stat_2d")
pp_check(bayes_binom, type = "loo_pit")
The above model looks good, despite being far from ready. It is not a Bayesian model, it has priors coming directly from the data. But for me, the most important part is being able to get a result that adhered to all the assumptions of a Bayesian model in terms of chain convergence and sampling. It did.
Now, we can create a very long list of plots that show how the model performed. Well, actually, it shows how the sampling performed. The posterior can deviate as much as it wants from the likelihood. It does not need to fit.
yrep<-posterior_predict(bayes_binom)
y<-dfmelt2$cases[!is.na(dfmelt2$cases) & !is.na(dfmelt2$Dose)]
group<-dfmelt2$Treatment[!is.na(dfmelt2$cases) & !is.na(dfmelt2$Dose)]
x<-dfmelt2$Dose[!is.na(dfmelt2$cases) & !is.na(dfmelt2$Dose)]
bayesplot::ppc_dens_overlay(y, yrep)
bayesplot::ppc_ecdf_overlay(y, yrep[sample(nrow(yrep), 25), ])
bayesplot::ppc_hist(y, yrep[1:8, ])
bayesplot::ppc_boxplot(y, yrep[1:8, ])
bayesplot::ppc_dens(y, yrep[200:202, ])
bayesplot::ppc_freqpoly(y, yrep[1:3,], alpha = 0.1, size = 1, binwidth = 5)
bayesplot::ppc_freqpoly_grouped(y, yrep[1:3,], group) + yaxis_text()
bayesplot::ppc_dens_overlay_grouped(y, yrep[1:25, ], group = group)
bayesplot::ppc_ecdf_overlay_grouped(y, yrep[1:25, ], group = group)
bayesplot::ppc_violin_grouped(y, yrep, group, size = 1.5)
bayesplot::ppc_violin_grouped(y, yrep, group, alpha = 0, y_draw = "points", y_size = 1.5)
bayesplot::ppc_violin_grouped(y, yrep, group, alpha = 0, y_draw = "both",
y_size = 1.5, y_alpha = 0.5, y_jitter = 0.33)bayesplot::bayesplot_theme_set(ggplot2::theme_linedraw())
bayesplot::color_scheme_set("viridisE")
bayesplot::ppc_stat_2d(y, yrep, stat = c("mean", "sd"))
bayesplot::bayesplot_theme_set(ggplot2::theme_grey())
bayesplot::color_scheme_set("brewer-Paired")
bayesplot::ppc_stat_2d(y, yrep, stat = c("median", "mad"))
Next up is code to assess if we chose the correct distribution. We call this the probability integral transform (PIT), which basically uses CDF properties to convert continuous random variables into uniformly distributed ones. However, this only holds exactly if the distribution used to generate the continuous random variables is the true distribution. A very nice blog post about this can be found here. So, using leave-one-out cross-validation,the model transforms the data and if the distribution chosen is the ‘actual’ distribution (you can never know) the transformation towards a uniform distribution will show the properties of a normal distribution in a QQ-plot. Easy, no? No, of course not, but besides the fact that you can never know what the CORRECT distribution is, it is a helpful tool to check. But it is also a bit counterintuitive when it comes to Bayesian analysis, as the posterior is a function of the prior and the likelihood. Having a solid prior, and a robust data collection procedure from which to get the likelihood, is more important than checking if the distribution is correct.
Anyhow, lets still have a look.
loo1 <- loo(bayes_binom, save_psis = TRUE, cores = 4)
psis1 <- loo1$psis_object
lw <- weights(psis1)
bayesplot::ppc_loo_pit_overlay(y, yrep, lw = lw)+theme_bw()
bayesplot::ppc_loo_pit_qq(y, yrep, lw = lw)+theme_bw()
bayesplot::ppc_loo_pit_qq(y, yrep, lw = lw, compare = "normal")+theme_bw()
keep_obs <- 1:30
bayesplot::ppc_loo_intervals(y, yrep, psis_object = psis1, subset = keep_obs)+theme_bw()
bayesplot::ppc_loo_intervals(y, yrep, psis_object = psis1, subset = keep_obs,
order = "median")+theme_bw()ce<-conditional_effects(bayes_binom)
pe<-plot(ce, rug=TRUE)
grid.arrange(pe$Treatment,
pe$Dose,
pe$Setting,
ncol=3)
And, once a suitable model has been made, one can take a look at the conditional distributions, which are shown here. Overall no effect of treatment, no effect of dose, but a clear effect of setting.
However, we are not finished yet.
I posted before that discrete data, like binomial data, can be analyzed in multiple ways. You can choose the binomial, like I did before, but also calculate the ratio and apply the beta-distribution. So, lets use the ratio I already calculated before, and plot the data.
ggplot(dfmelt2, aes(y=Study,
x=ratio,
colour=Treatment))+
geom_point()+
theme_bw()ggplot(dfmelt2, aes(y=ratio,
x=log(N),
colour=Study))+
geom_point()+
facet_wrap(~Treatment)+
theme_bw()+
theme(legend.position = "none")ggplot(dfmelt2, aes(y=ratio,
x=log(N)))+
geom_density_2d_filled()+
facet_wrap(~Treatment)+
theme_bw()+
theme(legend.position = "none")ggplot(dfmelt2, aes(y=ratio,
x=log(N)))+
geom_density_2d_filled()+
facet_wrap(~Setting, scales="free")+
theme_bw()+
theme(legend.position = "none")
Below, a precursor to a spatial analysis of the data which tells me, graphically, which group has more granular data. It is not a formal way to compare groups, but rather and additional step to get some grip on the data.
And, a first step towards a beta-regression. Here, I will once again use all factors as main effects, and not include the random component of Study. We already know it NEEDS to be included, but for now lets just work through the data and apply the beta distribution.
model_beta <- betareg(ratio ~ Treatment + Dose + Setting,
data = dfmelt2,
link = "logit")
summary(model_beta)
plot(model_beta)
tidy(model_beta)
marginaleffects(model_beta,
newdata =
datagrid(
Treatment = c("Ivermectine", "Control"),
Dose = c(14,0),
Setting = c("Early", "Early")))
We also create a beta regression using the brms family. Without a specified prior, this is just likelihood analysis, and now I do include the random component of Study. In Bayesian analysis, all parameters are random, so its a bit of a misnomer, but what I meant to say is that I will include a parameter Study which is built up from all levels of Study. This is different then estimating all studies separately.
Here we go, a non-Bayesian Bayesian analysis of the data using the Beta distribution. Now, the model itself should look straightforward, except one thing and that is phi or the precision estimate. Normally, you can use the Beta distribution using the alpha and beta values, and those will help you calculate the mean and variance of a distribution.
For instance, a Beta distribution using alpha=5 and beta =6 will look like this:
But, we can also use the Beta distribution using the mean and the precision, which you can add as a custom distribution in brms. The formula, mathematically, looks like this:
So, what I am going to model is two parts: the mean, and the variance, and that is what I can do in brms, separately.
model_beta_bayes <- brm(
bf(ratio ~ Treatment+Dose+Setting+(1|Study),
phi ~ Treatment),
data = dfmelt2,
family = Beta(),
chains = 4, iter = 2000, warmup = 1000,
cores = 6, seed = 1234,
backend = "cmdstanr")
summary(model_beta_bayes)
plot(model_beta_bayes)
tidy(model_beta_bayes, effects = "fixed")
model_beta_bayes %>%
gather_draws(`b_.*`, regex = TRUE) %>%
dplyr::mutate(component = ifelse(stringr::str_detect(.variable, "phi_"), "Precision", "Mean"),
intercept = stringr::str_detect(.variable, "Intercept"))%>%
ggplot(., aes(x = .value,
y = forcats::fct_rev(.variable),
fill = component)) +
geom_vline(xintercept = 0) +
stat_halfeye(aes(slab_alpha = intercept),
.width = c(0.8, 0.95), point_interval = "median_hdi") +
scale_fill_viridis_d(option = "viridis", end = 0.6) +
scale_slab_alpha_discrete(range = c(1, 0.4)) +
guides(fill = "none", slab_alpha = "none") +
labs(x = "Coefficient", y = "Variable",
caption = "80% and 95% credible intervals shown in black") +
facet_wrap(vars(component), ncol = 1, scales = "free_y") +
ggthemes::theme_clean()
model_beta_bayes %>%
spread_draws(`b_.*`, regex = TRUE) %>%
mutate(across(starts_with("b_phi"), ~exp(.))) %>%
mutate(across((!starts_with(".") & !starts_with("b_phi")), ~plogis(.))) %>%
gather_variables() %>%
median_hdi()
emmeans(model_beta_bayes, ~ Treatment,
transform = "response") %>%
contrast(method = "revpairwise")
emmeans(model_beta_bayes, ~ Treatment | Setting,
transform = "response") %>%
contrast(method = "revpairwise")
model_beta_bayes %>%
recover_types(dfmelt2) %>%
emmeans::emmeans( ~ Treatment) %>%
emmeans::contrast(method = "pairwise") %>%
gather_emmeans_draws() %>%
ggplot(aes(x = .value,
y = contrast)) +
stat_halfeye(alpha=0.5)+
theme_bw()emmeans(model_beta_bayes, ~ Treatment | Setting,
transform = "response") %>%
contrast(method = "pairwise")%>%
gather_emmeans_draws()%>%
dplyr::select(Setting, .value)%>%
group_by(Setting)%>%summarise(mean = mean(.value),
sd = sd(.value))model_beta_bayes %>%
recover_types(dfmelt2) %>%
emmeans::emmeans( ~ Treatment | Setting, adjust="sidak") %>%
emmeans::contrast(method = "pairwise") %>%
gather_emmeans_draws() %>%
ggplot(aes(x = .value,
y = contrast,
fill=Setting)) +
stat_halfeye(alpha=0.5)+
theme_bw()
Below the same Bayesian model but without random Study.
model_beta_bayes_fixed <- brm(
bf(ratio ~ Treatment+Dose+Setting,
phi ~ Treatment),
data = dfmelt2,
family = Beta(),
chains = 4, iter = 2000, warmup = 1000,
cores = 6, seed = 1234,
backend = "cmdstanr")
summary(model_beta_bayes_fixed)
LOO(model_beta_bayes_fixed, model_beta_bayes)
Still, lets take a look at how the estimates behave.
g1<-model_beta_bayes_fixed %>%
epred_draws(newdata = tibble(Treatment = c("Ivermectine", "Control"), Setting = c("Late", "Late"), Dose = c(24,24)))%>%
ggplot(., aes(x = .epred, y = Treatment, fill = Treatment)) +
stat_halfeye(.width = c(0.8, 0.95), point_interval = "median_hdi")+
scale_fill_viridis_d(option = "plasma", end = 0.8) +
guides(fill = "none") +
labs(x = "P (Death | Treatment, Setting = Late, Dose = 24mg)", y = NULL,caption = "80% and 95% credible intervals shown in black") +
theme_bw()
g2<-model_beta_bayes_fixed %>%
recover_types(dfmelt2) %>%
emmeans::emmeans( ~ Treatment) %>%
emmeans::contrast(method = "pairwise") %>%
gather_emmeans_draws() %>%
ggplot(aes(x = .value, y = contrast)) +
stat_halfeye()+
theme_bw()
g3<-model_beta_bayes_fixed %>%
recover_types(dfmelt2) %>%
emmeans::emmeans( ~ Treatment | Setting, adjust="sidak") %>%
emmeans::contrast(method = "pairwise") %>%
gather_emmeans_draws() %>%
ggplot(aes(x = .value, y = contrast, fill=Setting)) +
stat_halfeye()+
theme_bw()+
theme(legend.position="bottom")
grid.arrange(g1,g2,g3,nrow=3)
Below a new model, this time adding Setting to the precision parameter.
model_beta_bayes_fixed_phi <- brm(
bf(ratio ~ Treatment+Dose+Setting,
phi ~ Treatment + Setting),
data = dfmelt2,
family = Beta(),
inits = "0",
chains = 4, iter = 2000, warmup = 1000,
cores = 6, seed = 1234,
backend = "cmdstanr")
summary(model_beta_bayes_fixed_phi)
plot(model_beta_bayes_fixed_phi)
LOO(model_beta_bayes_fixed,
model_beta_bayes,
model_beta_bayes_fixed_phi)model_beta_bayes_fixed_phi %>%
predicted_draws(newdata = tibble(Treatment = c("Ivermectine", "Control"),Setting = c("Late", "Late"),Dose = c(24,24))) %>%
mutate(is_zero = .prediction == 0,.prediction = ifelse(is_zero, .prediction - 0.01, .prediction))%>% ggplot(., aes(x = .prediction)) +geom_histogram(aes(fill = is_zero), binwidth = 0.025,boundary = 0, color = "white") + geom_vline(xintercept = 0) +
scale_fill_viridis_d(option = "plasma", end = 0.5,guide = guide_legend(reverse = TRUE)) + labs(x = "Predicted P(Death | Treatment, Setting = Late, Dose = 24)", y = "Count", fill = "Is zero?") + facet_wrap(vars(Treatment), ncol = 2,labeller = labeller(quota = c(`Treatment` = "Treatment", `Control` = "Control"))) + ggthemes::theme_clean() + theme(legend.position = "bottom")
I already mentioned that there are perhaps too many zero’s in the dataset and if that is the case, I need to model that as well. So, how many zero’s did I actually predict using this model? Well, none! That does not seem correct, so we need to take a closer look pretty soon at the number of zero’s in the data, and find a way to model it.
model_beta_bayes_fixed_phi %>%
epred_draws(newdata = tidyr::expand_grid(Treatment = "Ivermectine",
Setting = "Late",
Dose = seq(0, 120, by = 1)))%>%
ggplot(., aes(x = Dose,
y = .epred)) +
stat_lineribbon() +
scale_fill_brewer(palette = "Purples") +
geom_hline(yintercept=0, col="black", lty=2)+
labs(x = "Dose ivermectine (mg)",
y = "P(Deaths%|Dose & Setting = Late)",
fill = "Credible interval") +
ggthemes::theme_clean() +
theme(legend.position = "bottom")emtrends(model_beta_bayes_fixed_phi, ~ Dose, var = "Dose",
transform = "response")
emtrends(model_beta_bayes_fixed_phi, ~ Dose, var = "Dose",
at = list(Dose = c(20, 50, 80)),
transform = "response")
model_beta_bayes_fixed_phi %>%
emtrends(~ Dose, var = "Dose",
at = list(Dose = c(14,24,48,96)),
transform = "response") %>%
gather_emmeans_draws()%>%
ggplot(., aes(x = .value,
fill = factor(Dose))) +
geom_vline(xintercept = 0) +
stat_halfeye(.width = c(0.8, 0.95), point_interval = "median_hdi",
slab_alpha = 0.75) +
scale_fill_viridis_d(option = "viridis", end = 0.6) +
labs(x = "Average marginal effect of dose (ivermectine)",
y = "Density", fill = "Dose",
caption = "80% and 95% credible intervals shown in black") +
ggthemes::theme_clean() +
theme(legend.position = "bottom")model_beta_bayes_fixed_phi %>%
emtrends(~ Dose | Setting, var = "Dose",
at = list(Dose = c(14,24,48,96)),
transform = "response") %>%
gather_emmeans_draws()%>%
ggplot(., aes(x = .value,
fill = factor(Setting))) +
geom_vline(xintercept = 0) +
stat_halfeye(.width = c(0.8, 0.95), point_interval = "median_hdi",
slab_alpha = 0.75) +
scale_fill_viridis_d(option = "viridis", end = 0.6) +
facet_wrap(~Dose, scales="free")+
labs(x = "Average marginal effect of dose (ivermectine)",
y = "Density", fill = "Dose",
caption = "80% and 95% credible intervals shown in black") +
theme_bw() +
theme(legend.position = "bottom")
But first, lets tackle the zero-inflation part of the data which I know exists.
So, the Zero-Inflated Beta distribution actually does one thing that the Beta-distribution does not do and that is model the excess zero part. And so, we have three parameters now: mean, precision, and zero-inflation.
Lets see how a first simple model looks like. I will model the mean and precision as a function of an intercept and the random Study component, and just look for the intercept of the zero-inflated component. In other words, I want to see if there IS a zero-inflated response.
model_beta_zi_int_only <- brm(
bf(ratio ~ 1 + (1|Study),
phi ~ 1 + (1|Study),
zi ~ 1),
data = dfmelt2,
family = zero_inflated_beta(),
chains = 4,
iter = 2000,
warmup = 1000,
cores = 6,
seed = 1234,
backend = "cmdstanr")
summary(model_beta_zi_int_only)
plot(model_beta_zi_int_only)
pp_check(model_beta_zi_int_only, ndraws=200)
LOO(model_beta_bayes_random,
model_beta_bayes_random2,
model_beta_zi_int_only)
tidy(model_beta_zi_int_only, effects = "fixed")
zi_intercept <- tidy(model_beta_zi_int_only, effects = "fixed") %>%
filter(component == "zi", term == "(Intercept)") %>%
pull(estimate)
plogis(zi_intercept)
df_random<-model_beta_zi_int_only %>%
spread_draws(r_Study[Study,term])
g1<-ggplot(df_random, aes(x=r_Study,
alpha=0.5))+
geom_density()+
theme_bw()+theme(legend.position = "none")
prior_summary(model_beta_zi_int_only)
stancode(model_beta_zi_int_only)
g2<-ggplot(df_random, aes(x=r_Study,
fill=Study,
alpha=0.5))+
geom_density()+
theme_bw()+theme(legend.position = "none")
grid.arrange(g1,g2,ncol=2)
dfmelt2 %>%
data_grid(Study) %>%
add_epred_draws(model_beta_zi_int_only,
dpar = c("mu", "phi", "zi"), allow_new_levels=TRUE) %>%
sample_draws(100) %>%
ggplot(aes(x = .epred, fill=Study)) +
geom_density(alpha=0.5) +
theme_bw()+
theme(legend.position="none")
Lets take it up a notch and include the variables of interest. I am still including them as main effects, because of the sparsity of the data. This could be contested of course.
model_beta_zi <- brm(
bf(ratio ~ Treatment+Dose+Setting,
phi ~ Treatment,
zi ~ Treatment),
data = dfmelt2,
family = zero_inflated_beta(),
chains = 4, iter = 2000, warmup = 1000,
cores = 6, seed = 1234,
backend = "cmdstanr")
summary(model_beta_zi)
pp_check(model_beta_zi, ndraws=200)
LOO(model_beta_zi_int_only,
model_beta_zi)
yrep<-posterior_predict(model_beta_zi)
x<-dfmelt2$Dose[!is.na(dfmelt2$cases)]
y<-dfmelt2$ratio[!is.na(x)]%>%na.omit()%>%as.vector()
group<-dfmelt2$Treatment[!is.na(x)]
bayesplot::bayesplot_theme_set(ggplot2::theme_linedraw())
bayesplot::color_scheme_set("viridisE")
bayesplot::ppc_stat_2d(y, yrep, stat = c("mean", "sd"))
bayesplot::bayesplot_theme_set(ggplot2::theme_grey())
bayesplot::color_scheme_set("brewer-Paired")
bayesplot::ppc_stat_2d(y, yrep, stat = c("median", "mad"))theme_set(theme_bw())
ce<-conditional_effects(model_beta_zi)
ce_plot<-plot(ce, rug=TRUE)
grid.arrange(ce_plot$Treatment,
ce_plot$Dose,
ce_plot$Setting,
ncol=3)
model_beta_zi %>%
predicted_draws(newdata = tibble(Treatment = c("Ivermectine", "Control"),
Setting = c("Late", "Late"),
Dose = c(24,24))) %>%
mutate(is_zero = .prediction == 0,
.prediction = ifelse(is_zero, .prediction - 0.01, .prediction))%>%
ggplot(., aes(x = .prediction)) +
geom_histogram(aes(fill = is_zero), binwidth = 0.025,
boundary = 0, color = "white") +
geom_vline(xintercept = 0) +
scale_fill_viridis_d(option = "plasma", end = 0.5,
guide = guide_legend(reverse = TRUE)) +
labs(x = "Predicted P(Death | Treatment, Setting = Late, Dose = 24)",
y = "Count",
fill = "Is zero?") +
facet_wrap(vars(Treatment), ncol = 2,
labeller = labeller(quota = c(`Treatment` = "Ivermectine",
`Control` = "Control"))) +
ggthemes::theme_clean() +
theme(legend.position = "bottom")
get_variables(model_beta_zi)
model_beta_zi %>%
recover_types(dfmelt2)%>%
spread_draws(b_Dose)%>%
ggplot(., aes(x=b_Dose, fill=as.factor(.chain)))+
geom_density(alpha=0.5)+
labs(fill="Chain")+
theme_bw()model_beta_zi%>%
emmeans(~ Treatment | Setting + Dose,
at = list(Dose = seq(0, 120, by = 10)),
epred = TRUE,
re_formula = NULL) %>%
contrast(method = "revpairwise") %>%
gather_emmeans_draws()%>%
ggplot(., aes(x = .value, y=Setting, fill=factor(Dose))) +
stat_halfeye(.width = c(0.8, 0.95),
point_interval = "median_hdi",
alpha=0.5) +
labs(x = "Effect of Treatment & Dose on the proportion of deaths by Setting", y = NULL,
caption = "80% and 95% credible intervals shown in black") +
ggthemes::theme_clean()
get_variables(model_beta_zi)
model_beta_zi %>%
recover_types(dfmelt2)%>%
spread_draws(b_Dose)%>%
ggplot(., aes(x=b_Dose, fill=as.factor(.chain)))+
geom_density(alpha=0.5)+
labs(fill="Chain")+
theme_bw()
Now we have our zero-inflated model we can once again extract posterior estimates from it.
model_beta_zi%>%
emmeans(~ Treatment | Setting + Dose,
at = list(Dose = seq(0, 120, by = 10)),
epred = TRUE,
re_formula = NULL) %>%
contrast(method = "revpairwise") %>%
gather_emmeans_draws()%>%
ggplot(., aes(x = .value, y=Setting, fill=factor(Dose))) +
stat_halfeye(.width = c(0.8, 0.95),
point_interval = "median_hdi",
alpha=0.5) +
labs(x = "Effect of Treatment & Dose on the proportion of deaths by Setting", y = NULL,
caption = "80% and 95% credible intervals shown in black") +
ggthemes::theme_clean()
model_beta_zi %>%
emmeans(~ Treatment + Dose,
at = list(Dose = seq(0, 120, by = 1)),
epred = TRUE,
re_formula = NULL) %>%
gather_emmeans_draws()%>%
ggplot(.,
aes(x = Dose,
y = .value,
color = Treatment,
fill = Treatment)) +
stat_lineribbon(aes(fill_ramp = stat(level))) +
scale_fill_viridis_d(option = "plasma", end = 0.8) +
scale_color_viridis_d(option = "plasma", end = 0.8) +
scale_fill_ramp_discrete(range = c(0.2, 0.7)) +
geom_point(data=dfmelt2, aes(x=Dose, y=ratio), col="black")+
facet_wrap(vars(Treatment), ncol = 2,
labeller = labeller(Treatment = c(`Ivermectine` = "Ivermectine",
`Control` = "Control"))) +
labs(x = "Dose (ivermectine)",
y = "Predicted proportion of deaths",
fill = "Treatment", color = "Treatment",
fill_ramp = "Credible interval") +
ggthemes::theme_clean() +
theme(legend.position = "bottom")
Lets run additional models to compare them against the previous model. I do include interactions in some of these models.
model_beta_zi_2 <- brm(
bf(ratio ~ Treatment*Dose+Setting,
phi ~ Treatment,
zi ~ Treatment),
data = dfmelt2,
family = zero_inflated_beta(),
chains = 4, iter = 2000, warmup = 1000,
cores = 6, seed = 1234,
backend = "cmdstanr")
summary(model_beta_zi_2)
LOO(model_beta_zi, model_beta_zi_2)model_beta_zi_3 <- brm(
bf(ratio ~ Treatment + Dose+Setting,
phi ~ Treatment,
zi ~ Treatment),
data = dfmelt2,
family = zero_inflated_beta(),
chains = 4, iter = 2000, warmup = 1000,
cores = 6, seed = 1234,
backend = "cmdstanr")
summary(model_beta_zi_3)
LOO(model_beta_zi,
model_beta_zi_2,
model_beta_zi_3)
ndraws = 50
dfmelt2%>%
group_by(Treatment, Setting) %>%
data_grid(Dose= seq_range(Dose, n = 101)) %>%
add_epred_draws(model_beta_zi_3, ndraws = ndraws) %>%
ggplot(aes(x = Dose, y = ratio, color = factor(Treatment))) +
geom_line(aes(y = .epred, group = paste(Treatment, .draw)), alpha=0.3) +
geom_point(data = dfmelt2, aes(x=Dose, y=ratio, color=Treatment)) +
scale_color_brewer(palette = "Dark2")+
facet_wrap(~Setting)+
theme_bw()+
theme(legend.position = "none")dfmelt2%>%
group_by(Treatment, Setting) %>%
data_grid(Dose= seq_range(Dose, n = 101)) %>%
add_predicted_draws(model_beta_zi_3) %>%
ggplot(aes(x = Dose, y = ratio,
color = factor(Treatment),
fill =factor(Treatment))) +
stat_lineribbon(aes(y = .prediction), .width = c(.95, .80, .50), alpha = 1/4) +
geom_point(data = dfmelt2) +
scale_fill_brewer(palette = "Set2") +
scale_color_brewer(palette = "Dark2")+
facet_wrap(~Setting)+
labs(fill = "Treatment", color="")+
guides(color = FALSE)+
theme_bw()
dfmelt2%>%
group_by(Treatment, Setting) %>%
data_grid(Dose= seq_range(Dose, n = 101)) %>%
add_predicted_draws(model_beta_zi_3) %>%
ggplot(aes(x = Dose, y = ratio)) +
stat_lineribbon(aes(y = .prediction),
.width = c(.99, .95, .8, .5), color = brewer.pal(5, "Blues")[[5]]) +
geom_point(data = dfmelt2) +
scale_fill_brewer() +
facet_grid(Treatment ~ Setting, space = "free_x", scales = "free_x")+
theme_bw()
So, the plots above show the posterior draws coming from the model. This all looks very nice, and neat, and impressive, but it is not a full Bayesian analysis. Without an informative prior, it cannot be, so we will do that now!
Below is the single final model I want to present. Its a zero-inflated beta-regression model from which I am modelling the mean, the precision and the zero-inflation. The priors are all the same — an informative no difference-what-so-ever prior. Ivermectin has had a lot of discussion with many people arguing that it cannot be of any help to Covid-19. It is a de-wormer in animals so why would you ever use it? Hence, I will add priors adding the mathematical equivalent of almost being certain it does not do anything to the proportion of deaths.
model_formula<-bf(
ratio ~ Treatment + Dose + Setting + (1|Study),
phi ~ Treatment + Setting,
zi ~ Treatment + Setting)
get_prior(
model_formula,
data = dfmelt2,
family = zero_inflated_beta())
priors<-c(prior(normal(0, 2), class = b, coef=Dose),
prior(normal(0, 2), class = b, coef=SettingLate),
prior(normal(0, 2), class = b, coef=SettingProphylaxis),
prior(normal(0, 2), class = b, coef=TreatmentIvermectine),
prior(normal(0, 2), class = Intercept),
prior(normal(0, 2), class = sd, group=Study),
prior(normal(0, 2), class = Intercept, dpar=phi),
prior(normal(0, 2), class = b, coef=SettingLate, dpar=phi),
prior(normal(0, 2), class = b, coef=SettingProphylaxis, dpar=phi),
prior(normal(0, 2), class = b, coef=TreatmentIvermectine, dpar=phi),
prior(normal(0, 2), class = Intercept, dpar=zi),
prior(normal(0, 2), class = b, coef=SettingLate, dpar=zi),
prior(normal(0, 2), class = b, coef=SettingProphylaxis, dpar=zi),
prior(normal(0, 2), class = b, coef=TreatmentIvermectine, dpar=zi))
bayes_beta_final<-
brm(model_formula,
data=dfmelt2,
family=zero_inflated_beta(),
prior=priors,
control = list(adapt_delta = 0.91,
max_treedepth = 11),
chains = 4, iter = 4000, warmup = 1000,
cores = 6, seed = 1234,
threads = threading(12),
backend = "cmdstanr")
summary(bayes_beta_final)
plot(bayes_beta_final)
pp_check(bayes_beta_final, ndraws=100)
LOO(model_beta_zi,
model_beta_zi_2,
model_beta_zi_3,
bayes_beta_final) # final model definitly not the best
yrep<-posterior_predict(bayes_beta_final)
x<-dfmelt2$Dose[!is.na(dfmelt2$cases)]
y<-dfmelt2$ratio[!is.na(x)]%>%na.omit()%>%as.vector()
group<-dfmelt2$Treatment[!is.na(x)]
bayesplot::bayesplot_theme_set(ggplot2::theme_linedraw())
bayesplot::color_scheme_set("viridisE")
bayesplot::ppc_stat_2d(y, yrep, stat = c("mean", "sd"))+theme_bw()
bayesplot::bayesplot_theme_set(ggplot2::theme_grey())
bayesplot::color_scheme_set("brewer-Paired")
bayesplot::ppc_stat_2d(y, yrep, stat = c("median", "mad"))+theme_bw()
Off to the fun part — the posterior draws.
bayes_beta_final%>%
predicted_draws(newdata = tibble(Treatment = c("Ivermectine", "Control"),
Setting = c("Late", "Late"),
Dose = c(24,24),
Study = c('Ravikirti', 'Ravikirti'))) %>%
mutate(is_zero = .prediction == 0,
.prediction = ifelse(is_zero, .prediction - 0.01, .prediction))%>%
ggplot(., aes(x = .prediction)) +
geom_histogram(aes(fill = is_zero), binwidth = 0.025,
boundary = 0, color = "white") +
geom_vline(xintercept = 0) +
scale_fill_viridis_d(option = "plasma", end = 0.5,
guide = guide_legend(reverse = TRUE)) +
labs(x = "Predicted P(Death | Treatment, Setting = Late, Dose = 24, Study = Ravikirti)",
y = "Count",
fill = "Is zero?") +
facet_wrap(vars(Treatment), ncol = 2,
labeller = labeller(quota = c(`Treatment` = "Ivermectine",
`Control` = "Control"))) +
ggthemes::theme_clean() +
theme(legend.position = "bottom")
bayes_beta_final %>%
emmeans(~ Treatment + Dose ,
at = list(Dose = seq(0, 120, by = 20)),
epred = TRUE,
re_formula = NULL) %>%
gather_emmeans_draws()%>%
ggplot(.,
aes(x = Dose,
y = .value,
color = Treatment,
fill = Treatment)) +
stat_lineribbon(aes(fill_ramp = stat(level))) +
scale_fill_viridis_d(option = "plasma", end = 0.8) +
scale_color_viridis_d(option = "plasma", end = 0.8) +
scale_fill_ramp_discrete(range = c(0.2, 0.7)) +
facet_wrap(vars(Treatment), ncol = 2,
labeller = labeller(Treatment = c(`Ivermectine` = "Ivermectine",
`Control` = "Control"))) +
labs(x = "Dose (ivermectine)",
y = "Predicted proportion of deaths",
fill = "Treatment", color = "Treatment",
fill_ramp = "Credible interval") +
ggthemes::theme_clean() +
theme(legend.position = "bottom")
bayes_beta_final %>%
emmeans(~ Setting + Dose,
at = list(Dose = seq(0, 120, by = 20)),
epred = TRUE,
re_formula = NULL) %>%
gather_emmeans_draws()%>%
ggplot(.,
aes(x = Dose,
y = .value,
color = Setting,
fill = Setting)) +
stat_lineribbon(aes(fill_ramp = stat(level))) +
scale_fill_viridis_d(option = "plasma", end = 0.8) +
scale_color_viridis_d(option = "plasma", end = 0.8) +
scale_fill_ramp_discrete(range = c(0.2, 0.7)) +
facet_wrap(vars(Setting), ncol = 2,
labeller = labeller(Setting = c(`Early` = "Early",
`Late` = "Late",
`Prophylaxis` = "Prophylaxis"
))) +
labs(x = "Dose (ivermectine)",
y = "Predicted proportion of deaths",
fill = "Setting", color = "Setting",
fill_ramp = "Credible interval") +
ggthemes::theme_clean() +
theme(legend.position = "bottom")
bayes_beta_final %>%
emmeans(~ Treatment + Setting + Dose,
at = list(Dose = seq(0, 120, by = 20)),
epred = TRUE,
re_formula = NULL) %>%
gather_emmeans_draws()%>%
ggplot(.,
aes(x = Dose,
y = .value,
color = Treatment,
fill = Treatment)) +
stat_lineribbon(aes(fill_ramp = stat(level))) +
scale_fill_viridis_d(option = "plasma", end = 0.8, alpha=0.5) +
scale_color_viridis_d(option = "plasma", end = 0.8, alpha=0.5) +
scale_fill_ramp_discrete(range = c(0.2, 0.7)) +
geom_point(data=dfmelt2, aes(x=Dose, y=ratio, col=Treatment))+
facet_wrap(vars(Setting),
ncol = 3,
labeller = labeller(Setting = c(`Early` = "Early",
`Late` = "Late",
`Prophylaxis` = "Prophylaxis"))) +
labs(x = "Dose (ivermectine)",
y = "Predicted proportion of deaths",
fill = "Treatment",
color = "Treatment",
fill_ramp = "Credible interval") +
ggthemes::theme_clean() +
theme(legend.position = "bottom")
And, here comes the final model which shows the contrast between the treatments as a function of Setting and Dose. The Dose is an exrapolation, for sure, and so should be take with caution, but it does show neately:
- That the proportion of deaths is lower in Ivermectine than the control.
- The the effect differs per Setting.
- That Prophylaxis does not have enough data to tell us really anything.
- That the prior of ‘no-effect’ cannot take away what is in the data, which may still have its limits.
Hence, these findings are not final and the analysis would benefit from future Bayesian updating. But for now, it seems that the data are telling us that the effect of Ivermectin on the proportion of deaths is unlikely to be zero. Predicting who exactly will benefit is impossible at this time.
bayes_beta_final%>%
emmeans(~ Treatment | Setting + Dose,
at = list(Dose = seq(0, 120, by = 10)),
epred = TRUE,
re_formula = NULL) %>%
contrast(method = "revpairwise") %>%
gather_emmeans_draws()%>%
ggplot(., aes(x = .value, y=contrast, fill=factor(Dose))) +
stat_halfeye(.width = c(0.8, 0.95),
point_interval = "median_hdi",
alpha=0.5) +
labs(x = "P(death | Treatment, Setting & Dose) - Informative prior assuming no effects overal", y = NULL,
caption = "80% and 95% credible intervals shown in black") +
facet_wrap(~Setting, scales="free")+
labs(fill="Dose")+
ggthemes::theme_clean()
Let me know if something is missing, amiss or just plain wrong!