ABC data

Bayesian analysis

8 min readJun 14, 2022

--

Bayesian analysis is often called the ‘other way’ or ‘the new kind of statistics’, but that is of course nonsense. Bayes’ Theorem is older than Fisher’s work and is much more true to science and the acquiring of knowledge than maximum likelihood ever can be.

To show that Bayesian analysis is really not that difficult, or far of from how we humans deal with (new) information, I have posted several Bayesian analyses already. On known datasets, such as Iris, Chickweight, MTcars or Rent99. Today, I will show how I analyzed a simulated dataset called ABC. It is nothing fancy, but that does not have to be. Bayes’ Theorem works wonders, especially in small sample, and its power comes from beyond the current dataset. That is why using Bayesian analysis will always tell you more than any kind of Maximum Likelihood approach ever can.

When it comes to Bayes and Bayesian analysis, it often seems to come down to knowing three words: prior, likelihood, posterior. Agreed, these are important words to know, but they are not the main driver of Bayesian analysis. That would be you, the modeler. The researcher. The one that is trying his/her best to acquire as much solid information prior to creating a new dataset (the likelihood) to extent past knowledge to new current knowledge (posterior). Modeling is inherently human and Bayes had a great way of show that conditional way of working.

Lets get started on the ABC dataset and work our way through it. I placed the codes in the back, and I want the graphs to be the primary driver of the post. This is because HOW to do it, you can figure out later. But first it is important that you become familiar with the absence of the p-value and embrace uncertainty.

In life, that is often all we have.

A plot of the simulated dataset. Image by author.

And a way of showing how each mean of each category differs from the grand mean. These plots are extremely easy to make in R, but add very little except more numbers, statistics etc. Image by author.

Another plot showing how you can create p-values for each possible combination of comparison across each of the five levels. The overkill is clear to see, but make no mistake: the p-value is still a dominant driver in the sciences. Image by author.

Alright, so lets quickly turn our minds to Bayesian analysis. I will be using the brms package, although many other packages exist like rstanarm or MCMCglmm. Like most things in R, you can find ten different ways of doing the same thing.

A Bayesian model with no pre-determined prior is just a Frequentist model using Maximum Likelihood. The results should be completely identical. If we do not specify a prior, the brms package will do it for us because the sampling engines requires a prior. You can see down below that five priors are determined, based on the dataset that will be analyzed. This is NOT Bayesian analysis.

Priors determined by the **brms** package by looking at the data. Image by author.

Now, the graph below is the most interest. We already showed p-values above, but no Bayesian could care less about p-values and neither should you. All that matters is the estimate, effect size and density of the posterior distribution showing the difference between any combination of groups. Hence, if we see that the difference between category A & C does not contain zero, that means that the difference is not zero. Here, it hovers around 1.6 with quite some uncertainty surrounding that number. This means that although there is a difference between A & C (considering this dataset and our non-informative priors), the difference has a range that needs to be included. It does NOT mean that the difference will always be around 1.6.

If you would like a better, but not completely identical, comparison between the Frequentist and Bayesian way of analyzing then look below. The biggest difference, but not of great importance now, is that the linear regression (lm) model treats the condition as a fixed effect, although the Bayesian model treats it as random. BTW, be careful when discussing fixed and random effects with a ‘Bayesian’ — according to Bayesian theory all parameters are inherently random.

When creating a Bayesian model in R, the brms package needs to talk to the STAN sampler and it does so by transforming the data into STAN code before sanding it to STAN. Down below, you can see the code. If you like, you can also model your data directly via STAN code. Below, you see STAN code from scratch. Tedious is it not?

Model in STAN language. It will often use half-cauchy distributions for modelling variance components. Image by author.

You can enter the STAN code directly and get Bayesian results. Image by author.

And the Bayesian results. You obtain, for each parameter sampled, the **average**, **standard error** and **various percentages of the distribution**. You also see metrics, like **n_eff** and **Rhat**, related to standard Bayesian evaluation but they offer little to me. I prefer plotting the posterior distributions and see if the chains made the appropriate noise-like behavior. Image by author.

Now, lets compare the results from a linear model (linear_results), to a Bayesian model including the condition either as fixed (bayes_results2) or random (bayes_results).

The “posterior” estimates coming from a linear, and two Bayesian model. Of course, the linear model cannot provide posterior estimates. They are likelihood estimates. Image by author.

One of the things I never really liked about R is that you can conduct the same analysis via many different ways, but that the surrounding features tend to differ. For instance, the sampling procedure comes from rstan which allows my STAN code to be connected to the dataset and retrieve data in R format. From that format I can obtain posterior draws, and plot them, but it looks somewhat different then if I would use the brms package combined with bayesplot. Hence, many many combinations are possible but you always need to make sure that you know what package your are using.

Once again, the STAN code. Built from scratch. Image by author.

The sampling procedure. Image by author.

The magic of **tidyverse** and **tidybayes** to get everything you want from your posterior draws. Image by author.

Posterior distribution of the difference between each of the categories. No p-values. You have to love this! Image by author.

I already said before that you can also sample from the data using rstanarm.

The syntax for Bayesian modeling could not be more simple. However, no informative priors were provided so we can hardly call this model Bayesian. Image by author.

The results from **rstanarm**. Image by author.

Default posterior distribution plot coming from **rstanarm**. Image by author.

Up until now I used non-informative priors which makes any of the above hardly Bayesian. There is never ever a reason NOT to use informative priors. If you decide to still go for non-informative priors, you are telling yourself and your audience that you have no idea what it is you are doing. You are saying that you are unwilling to use past knowledge and let everything depend from that single piece of data you have?

That cannot be. So I built a model, in which I used a normal distribution and two gamma distributions for the intercept and variances, respectively. What makes them informative is not their mean, but their variance. I made that small enough to let them have effect, and although the ABC database is a simulated one I do now have priors that will have an influence. THIS is how Bayesian analysis should be conducted.

The model used in **brms**. Image by author.

The distribution of the observed values and the posterior draws (**Yrep**). The solar flares extending the responses coming from the dataset is where the posterior distributions are showing the discrepancy between the prior and the likelihood. Image by author.

All of the above plots should some sampling variation, but not extremes. That is ALL I am interested in. Priors and data may very well deviate. No harm. That is how science works.

You can see that the median response deviates from the posterior draws, BUT that the draws themselves are sampled beautifuly. Image by author.

So, I hope this post was informative in the sense that it shows that Bayesian analysis is not that difficult to run. The real difficulty extends way beyond the course of this post and that is making sure you have, before you conduct your experiment, acquired all worthwhile knowledge to know how to fill the gap. Because not only will that knowledge determine the prior, it will also determine the way you sample new data.

The aqcuiring of knowledge requires focus, but above all, more knowledge.

Let me know if something is not correct or amiss!

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai

medium.com

ABC data

Bayesian analysis

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai

Written by Dr. Marc Jacobs