MTCars

The Bayesian Way

Dr. Marc Jacobs
5 min readJun 9, 2022

I have been doing several ‘The Bayesian Way’ posts up until now, and I will not stop until I pushed out several others. The reason for doing them is because I want you to get acquainted with Bayesian analysis from a programmatic point of view. R has some great tools modelling via Bayes theorem and as with most things R, you can model via at least four different ways.

Today, I will use Bayesian analysis on the MTCars dataset, which is a standard dataset in R. That is nice, so you can easily recreate my codes which I will post at the bottom. The reason for doing that is because I want you to look at what I did from a graphical perspective, looking at the output and reading the text underneath. What I show in the post is not the entire plethora of code I will showcase at the end, but just enough to get a clue. Then, it is off to doing it yourself by exploring and trying out the multitude of options available for running a Bayesian analysis, hopefully never forgetting the primary aspects such as the prior, likelihood, posterior, and conditional probabilities.

Alright, so lets get started. The dataset should be somewhat familiar to you and if not, it is no Pandora’s box.

The skimr package is great for getting that first glimpse of the data.
Then of to using ggplot and the ggstatsplot packages for getting clear, and not so clear graphics. There is such a thing as overkill and the latter plot is definitly showcasing it.
To the left another overkill plot and to the right a nice clean correlation plot which can come in handy.

Of to the most important part of Bayesian analysis — the prior. The prior is the knowledge you have up until now and that knowledge should account for something. It is not vague, it should mean something else why are we doing this analysis in the first place?

But, just to get started, no priors at all. We will let the brms package decide, who either does not want to (flat) or gets them from the data which is almost a self-fulfilling prophecy.

And, here we go. This is maximum likelihood, since without priors the posterior estimates from the model account whatever is inside the data — which amounts to maximum likelihood.
Chains of course look good, since no integration between prior and likelihood was necessary. To the right, the distribution of the hp variable for all four chains. They do not have to be the same but should be, to some extent, overlapping.
The relationship between the variables(intercept, hp, and cyl), as can be seen coming from the different sampling chains. As already mentioned, the chains should be overlapping, but may deviate. They should not deviate much, as you would not expect drivers doing completely different laps when racing on a race track. At a certain moment, it should al coincide. The plot to the left shows the assumption of the gaussian model which is that the residual variance has not relationship with any of the parameters, which it does not. Here, the variance is unrelated to hp.

So, that was fun, but not interesting at all. Time to use informative priors, meaning that I will state my current knowledge on the subject and select distributions and hyperparameters that mimic that knowledge in the best way possible.

Distributions for variance estimates are the most interesting, since they have a natural boundary of zero to the left. There can be infinite variance, but there is a downward limit which is zero. No negative variance allowed, and so if we would use the normal distribution, we would allow it to become negative which can hamper the sampling procedure. As such, I will use the gamma distribution. And I want to it be not too tight, but for sure also not infinite.

The PQRS tool is a great tool for quickly looking at several distributions, and calculating their mean and variance.

For the rest I expect a stronger relationship between mpg and cyl, then for hp, but I am more sure about hp. There is an interaction effect between cyl and hp, but not excessive. I have an informative intercept and of course the gamma distribution for the standard deviation of the residual variance, which in brms is labeled sigma.

The priors I used.
The model and the chains.
The posterior sampling for the response to the left and to the right an neat animation of the posterior draws from the model I made on top of the latest data I used. The animation clearly shows that posterior draws are extremely important to assess and dig deeper into your model. For instance, the spread of the draws tell you something about how (un)sure the model is, and the distance between the posterior draws and the likelihood shows you how influential the prior was. There is nothing wrong with that, but it does go to show that modeling is science, and science means using both the knowledge you had up until the point of the newest data, and the point thereafter. IT DOES NOT HAVE TO COINCIDE. Many people still do not get that and are frightened by models that deviate from the latest data-points. Me? I love it!
And the predictions shows very clearly and nicely. Prediction intervals are something quite different then the confidence or credible intervals you get from the parameters.

Alright, so instead of just sticking with the Gaussian distribution, which happens often, lets also use Bayesian analysis to estimate a categorical variable. For this example, we will use cylinder which in this dataset can be either four, six or eight. In this example, I will build a model that will predict, or show, the probability of an engine having four, six, or eight cylinders. I am not sure what the scientific value is of such an exercise, but lets do it anyhow. Just because we can (and analyzing data just because we can is happening to often already). At least you can see how it will look like.

The data, and the priors coming from the likelihood.
The Bayesian model which is NOT a Bayesian model.
The sampling space to the left, and the probability distribution to the right. What you see to the right is the probability, and actually the frequency, of cars with four (1), six (2) and eight (3) cylinders. The six and eight cylinders connect to intercept[1] and intercept[2]. That is because I analyze the data in an ordinal way, using a cumulative link function.
The predictions coming from the model make more sense.
Toe the left the observed and predicted values, the probability plot showcasing the ordinal structure used. There is a clear sigmoid relationship for cylinder four and eight, but a normal one for six. At least, in the dataset that seams to be the case. To the right, the animation showing the left plot. Here you can see how the posterior draws making up the confidence bands portraid. THIS is why only using Bayesian analysis you can say that the value has a 95% or 88% or whatever XX% of popping up.
And perhaps the strangest plot of all. This is the posterior correlation plot between the cycles for mpg = 20. Well, it actually is not that strange, since the family used is the cumulative family which means that the levels of the response variable are entwined. Can’t say the plot is helping me, a lot, but at least it showcases what you can do with all the posterior draws. That, and much more.

Alright, so this is the end of the post. Codes down below and shoot me a message if something is missing, unclear, or downright wrong!

--

--

Dr. Marc Jacobs

Scientist. Builder of models, and enthousiast of statistics, research, epidemiology, probability, and simulations for 10+ years.