MTCars
The Bayesian Way
I have been doing several ‘The Bayesian Way’ posts up until now, and I will not stop until I pushed out several others. The reason for doing them is because I want you to get acquainted with Bayesian analysis from a programmatic point of view. R has some great tools modelling via Bayes theorem and as with most things R, you can model via at least four different ways.
Today, I will use Bayesian analysis on the MTCars dataset, which is a standard dataset in R. That is nice, so you can easily recreate my codes which I will post at the bottom. The reason for doing that is because I want you to look at what I did from a graphical perspective, looking at the output and reading the text underneath. What I show in the post is not the entire plethora of code I will showcase at the end, but just enough to get a clue. Then, it is off to doing it yourself by exploring and trying out the multitude of options available for running a Bayesian analysis, hopefully never forgetting the primary aspects such as the prior, likelihood, posterior, and conditional probabilities.
Alright, so lets get started. The dataset should be somewhat familiar to you and if not, it is no Pandora’s box.
Of to the most important part of Bayesian analysis — the prior. The prior is the knowledge you have up until now and that knowledge should account for something. It is not vague, it should mean something else why are we doing this analysis in the first place?
But, just to get started, no priors at all. We will let the brms package decide, who either does not want to (flat) or gets them from the data which is almost a self-fulfilling prophecy.
So, that was fun, but not interesting at all. Time to use informative priors, meaning that I will state my current knowledge on the subject and select distributions and hyperparameters that mimic that knowledge in the best way possible.
Distributions for variance estimates are the most interesting, since they have a natural boundary of zero to the left. There can be infinite variance, but there is a downward limit which is zero. No negative variance allowed, and so if we would use the normal distribution, we would allow it to become negative which can hamper the sampling procedure. As such, I will use the gamma distribution. And I want to it be not too tight, but for sure also not infinite.
For the rest I expect a stronger relationship between mpg and cyl, then for hp, but I am more sure about hp. There is an interaction effect between cyl and hp, but not excessive. I have an informative intercept and of course the gamma distribution for the standard deviation of the residual variance, which in brms is labeled sigma.
Alright, so instead of just sticking with the Gaussian distribution, which happens often, lets also use Bayesian analysis to estimate a categorical variable. For this example, we will use cylinder which in this dataset can be either four, six or eight. In this example, I will build a model that will predict, or show, the probability of an engine having four, six, or eight cylinders. I am not sure what the scientific value is of such an exercise, but lets do it anyhow. Just because we can (and analyzing data just because we can is happening to often already). At least you can see how it will look like.
Alright, so this is the end of the post. Codes down below and shoot me a message if something is missing, unclear, or downright wrong!