ABC data
Bayesian analysis
Bayesian analysis is often called the ‘other way’ or ‘the new kind of statistics’, but that is of course nonsense. Bayes’ Theorem is older than Fisher’s work and is much more true to science and the acquiring of knowledge than maximum likelihood ever can be.
To show that Bayesian analysis is really not that difficult, or far of from how we humans deal with (new) information, I have posted several Bayesian analyses already. On known datasets, such as Iris, Chickweight, MTcars or Rent99. Today, I will show how I analyzed a simulated dataset called ABC. It is nothing fancy, but that does not have to be. Bayes’ Theorem works wonders, especially in small sample, and its power comes from beyond the current dataset. That is why using Bayesian analysis will always tell you more than any kind of Maximum Likelihood approach ever can.
When it comes to Bayes and Bayesian analysis, it often seems to come down to knowing three words: prior, likelihood, posterior. Agreed, these are important words to know, but they are not the main driver of Bayesian analysis. That would be you, the modeler. The researcher. The one that is trying his/her best to acquire as much solid information prior to creating a new dataset (the likelihood) to extent past knowledge to new current knowledge (posterior). Modeling is inherently human and Bayes had a great way of show that conditional way of working.
Lets get started on the ABC dataset and work our way through it. I placed the codes in the back, and I want the graphs to be the primary driver of the post. This is because HOW to do it, you can figure out later. But first it is important that you become familiar with the absence of the p-value and embrace uncertainty.
In life, that is often all we have.
Alright, so lets quickly turn our minds to Bayesian analysis. I will be using the brms package, although many other packages exist like rstanarm or MCMCglmm. Like most things in R, you can find ten different ways of doing the same thing.
A Bayesian model with no pre-determined prior is just a Frequentist model using Maximum Likelihood. The results should be completely identical. If we do not specify a prior, the brms package will do it for us because the sampling engines requires a prior. You can see down below that five priors are determined, based on the dataset that will be analyzed. This is NOT Bayesian analysis.
Now, the graph below is the most interest. We already showed p-values above, but no Bayesian could care less about p-values and neither should you. All that matters is the estimate, effect size and density of the posterior distribution showing the difference between any combination of groups. Hence, if we see that the difference between category A & C does not contain zero, that means that the difference is not zero. Here, it hovers around 1.6 with quite some uncertainty surrounding that number. This means that although there is a difference between A & C (considering this dataset and our non-informative priors), the difference has a range that needs to be included. It does NOT mean that the difference will always be around 1.6.
If you would like a better, but not completely identical, comparison between the Frequentist and Bayesian way of analyzing then look below. The biggest difference, but not of great importance now, is that the linear regression (lm) model treats the condition as a fixed effect, although the Bayesian model treats it as random. BTW, be careful when discussing fixed and random effects with a ‘Bayesian’ — according to Bayesian theory all parameters are inherently random.
When creating a Bayesian model in R, the brms package needs to talk to the STAN sampler and it does so by transforming the data into STAN code before sanding it to STAN. Down below, you can see the code. If you like, you can also model your data directly via STAN code. Below, you see STAN code from scratch. Tedious is it not?
Now, lets compare the results from a linear model (linear_results), to a Bayesian model including the condition either as fixed (bayes_results2) or random (bayes_results).
One of the things I never really liked about R is that you can conduct the same analysis via many different ways, but that the surrounding features tend to differ. For instance, the sampling procedure comes from rstan which allows my STAN code to be connected to the dataset and retrieve data in R format. From that format I can obtain posterior draws, and plot them, but it looks somewhat different then if I would use the brms package combined with bayesplot. Hence, many many combinations are possible but you always need to make sure that you know what package your are using.
I already said before that you can also sample from the data using rstanarm.
Up until now I used non-informative priors which makes any of the above hardly Bayesian. There is never ever a reason NOT to use informative priors. If you decide to still go for non-informative priors, you are telling yourself and your audience that you have no idea what it is you are doing. You are saying that you are unwilling to use past knowledge and let everything depend from that single piece of data you have?
That cannot be. So I built a model, in which I used a normal distribution and two gamma distributions for the intercept and variances, respectively. What makes them informative is not their mean, but their variance. I made that small enough to let them have effect, and although the ABC database is a simulated one I do now have priors that will have an influence. THIS is how Bayesian analysis should be conducted.
All of the above plots should some sampling variation, but not extremes. That is ALL I am interested in. Priors and data may very well deviate. No harm. That is how science works.
So, I hope this post was informative in the sense that it shows that Bayesian analysis is not that difficult to run. The real difficulty extends way beyond the course of this post and that is making sure you have, before you conduct your experiment, acquired all worthwhile knowledge to know how to fill the gap. Because not only will that knowledge determine the prior, it will also determine the way you sample new data.
The aqcuiring of knowledge requires focus, but above all, more knowledge.
Let me know if something is not correct or amiss!