ABC data

Bayesian analysis

Dr. Marc Jacobs
8 min readJun 14, 2022

Bayesian analysis is often called the ‘other way’ or ‘the new kind of statistics’, but that is of course nonsense. Bayes’ Theorem is older than Fisher’s work and is much more true to science and the acquiring of knowledge than maximum likelihood ever can be.

To show that Bayesian analysis is really not that difficult, or far of from how we humans deal with (new) information, I have posted several Bayesian analyses already. On known datasets, such as Iris, Chickweight, MTcars or Rent99. Today, I will show how I analyzed a simulated dataset called ABC. It is nothing fancy, but that does not have to be. Bayes’ Theorem works wonders, especially in small sample, and its power comes from beyond the current dataset. That is why using Bayesian analysis will always tell you more than any kind of Maximum Likelihood approach ever can.

When it comes to Bayes and Bayesian analysis, it often seems to come down to knowing three words: prior, likelihood, posterior. Agreed, these are important words to know, but they are not the main driver of Bayesian analysis. That would be you, the modeler. The researcher. The one that is trying his/her best to acquire as much solid information prior to creating a new dataset (the likelihood) to extent past knowledge to new current knowledge (posterior). Modeling is inherently human and Bayes had a great way of show that conditional way of working.

Lets get started on the ABC dataset and work our way through it. I placed the codes in the back, and I want the graphs to be the primary driver of the post. This is because HOW to do it, you can figure out later. But first it is important that you become familiar with the absence of the p-value and embrace uncertainty.

In life, that is often all we have.

The creation of data. Image by author.
A plot of the simulated dataset. Image by author.
And a way of showing how each mean of each category differs from the grand mean. These plots are extremely easy to make in R, but add very little except more numbers, statistics etc. Image by author.
Another plot showing how you can create p-values for each possible combination of comparison across each of the five levels. The overkill is clear to see, but make no mistake: the p-value is still a dominant driver in the sciences. Image by author.

Alright, so lets quickly turn our minds to Bayesian analysis. I will be using the brms package, although many other packages exist like rstanarm or MCMCglmm. Like most things in R, you can find ten different ways of doing the same thing.

A Bayesian model with no pre-determined prior is just a Frequentist model using Maximum Likelihood. The results should be completely identical. If we do not specify a prior, the brms package will do it for us because the sampling engines requires a prior. You can see down below that five priors are determined, based on the dataset that will be analyzed. This is NOT Bayesian analysis.

Priors determined by the brms package by looking at the data. Image by author.
The summary results coming from the brms package and the convergence of each of the three parameters of interest. Although you saw five priors in the graph above, the model actually only contains three paramaters of interest. The difference between source default or source (vectorized) is that they are named differently, but in fact the three priors for class sd call for the same thing. Hence, brms will only use one and that is why you only see three distributions with accompanying chains. Image by author.
Below you see three different graphs showing the exact same thing, but slightly different: the posterior distribution of each of the five categories. Image by author.

Now, the graph below is the most interest. We already showed p-values above, but no Bayesian could care less about p-values and neither should you. All that matters is the estimate, effect size and density of the posterior distribution showing the difference between any combination of groups. Hence, if we see that the difference between category A & C does not contain zero, that means that the difference is not zero. Here, it hovers around 1.6 with quite some uncertainty surrounding that number. This means that although there is a difference between A & C (considering this dataset and our non-informative priors), the difference has a range that needs to be included. It does NOT mean that the difference will always be around 1.6.

Image by author.

If you would like a better, but not completely identical, comparison between the Frequentist and Bayesian way of analyzing then look below. The biggest difference, but not of great importance now, is that the linear regression (lm) model treats the condition as a fixed effect, although the Bayesian model treats it as random. BTW, be careful when discussing fixed and random effects with a ‘Bayesian’ — according to Bayesian theory all parameters are inherently random.

Linear regression (lm) model showing the summary and assumption plots. Image by author.
The Bayesian model. The estimates are not directly the same because the lm model specified fixed effects whereas the Bayesian model specified random effects. Image by author.

When creating a Bayesian model in R, the brms package needs to talk to the STAN sampler and it does so by transforming the data into STAN code before sanding it to STAN. Down below, you can see the code. If you like, you can also model your data directly via STAN code. Below, you see STAN code from scratch. Tedious is it not?

Model in STAN language. It will often use half-cauchy distributions for modelling variance components. Image by author.
Here, you can see the Cauchy distribution used, and the Normal distribution used. The Cauchy is not my favorite for modelling variance — I prefer gamma or beta. Image by author.
You can enter the STAN code directly and get Bayesian results. Image by author.
And the Bayesian results. You obtain, for each parameter sampled, the average, standard error and various percentages of the distribution. You also see metrics, like n_eff and Rhat, related to standard Bayesian evaluation but they offer little to me. I prefer plotting the posterior distributions and see if the chains made the appropriate noise-like behavior. Image by author.

Now, lets compare the results from a linear model (linear_results), to a Bayesian model including the condition either as fixed (bayes_results2) or random (bayes_results).

The “posterior” estimates coming from a linear, and two Bayesian model. Of course, the linear model cannot provide posterior estimates. They are likelihood estimates. Image by author.
And the results, which deviate somewhat for all three, with the biggest difference in Bayes Random. That make quite some sense, considering how fixed and random components work. If you want to read more about this, and why Mixed Models (containing fixed and random components) are the closest thing to Bayesian modeling (especially if you know that Mixed Models are models used for Meta-analysis). Image by author.

One of the things I never really liked about R is that you can conduct the same analysis via many different ways, but that the surrounding features tend to differ. For instance, the sampling procedure comes from rstan which allows my STAN code to be connected to the dataset and retrieve data in R format. From that format I can obtain posterior draws, and plot them, but it looks somewhat different then if I would use the brms package combined with bayesplot. Hence, many many combinations are possible but you always need to make sure that you know what package your are using.

Once again, the STAN code. Built from scratch. Image by author.
The sampling procedure. Image by author.
The magic of tidyverse and tidybayes to get everything you want from your posterior draws. Image by author.
Posterior distribution of the difference between each of the categories. No p-values. You have to love this! Image by author.

I already said before that you can also sample from the data using rstanarm.

The syntax for Bayesian modeling could not be more simple. However, no informative priors were provided so we can hardly call this model Bayesian. Image by author.
The results from rstanarm. Image by author.
Default posterior distribution plot coming from rstanarm. Image by author.
The plots when using bayesplot. This is all you need to get an idea about the sampling efficiency and accuracy of your model. The chains need to have some form of convergence mixed with acceptable noise. No drift, no patterns. That is what you want to see. Image by author.
And the posterior draws and plots coming from the model and its data. Image by author.
And a very quick way to achieve what we have seen above in a simple syntax of code, using MCMCglmm, emmeans and tidybayes. So many flavors, so many options to chose from. Image by author.

Up until now I used non-informative priors which makes any of the above hardly Bayesian. There is never ever a reason NOT to use informative priors. If you decide to still go for non-informative priors, you are telling yourself and your audience that you have no idea what it is you are doing. You are saying that you are unwilling to use past knowledge and let everything depend from that single piece of data you have?

That cannot be. So I built a model, in which I used a normal distribution and two gamma distributions for the intercept and variances, respectively. What makes them informative is not their mean, but their variance. I made that small enough to let them have effect, and although the ABC database is a simulated one I do now have priors that will have an influence. THIS is how Bayesian analysis should be conducted.

The model used in brms. Image by author.
The results. You can cearly see how the posterior draws deviate from the priors, which is not a problem at all. As long as the chains converge and they do. Image by author.
The distribution of the observed values and the posterior draws (Yrep). The solar flares extending the responses coming from the dataset is where the posterior distributions are showing the discrepancy between the prior and the likelihood. Image by author.
Looking good! Image by author.
Looking good! Image by author.
Looking good! Image by author.
Looking good! Image by author.

All of the above plots should some sampling variation, but not extremes. That is ALL I am interested in. Priors and data may very well deviate. No harm. That is how science works.

You can see that the median response deviates from the posterior draws, BUT that the draws themselves are sampled beautifuly. Image by author.
Prediction plots on the observation level. Image by author.
Error plots. Image by author.
More diagnostic plots looking if the distributional family of the response (here Gaussian) was chosen wisely. It seems it was, but that should hardly come as a surprise. Image by author.

So, I hope this post was informative in the sense that it shows that Bayesian analysis is not that difficult to run. The real difficulty extends way beyond the course of this post and that is making sure you have, before you conduct your experiment, acquired all worthwhile knowledge to know how to fill the gap. Because not only will that knowledge determine the prior, it will also determine the way you sample new data.

The aqcuiring of knowledge requires focus, but above all, more knowledge.

Let me know if something is not correct or amiss!

--

--

Dr. Marc Jacobs

Scientist. Builder of models, and enthousiast of statistics, research, epidemiology, probability, and simulations for 10+ years.