Regression : Bayesian vs. Frequentist

5 min readNov 26, 2021

Bayesian and Frequentist are both statistical inference methods that evaluate evidence for a hypothesis. These two approaches are often debated in solving a problem in the doctrine of chances, and aiming for the best guess of an unknown parameter, which applies to regression as well.

Introduction

Let’s think of a coin flipping example. If you toss a coin in the air and catch it, would the top come up as heads or tails? What is the probability of the coin‘s head side showing up?

Now, most people would answer that the probability is 50/50. Because it’s common sense, and we all just know it by experience.

However, the approach is quite different for statisticians. Following is how each statistician would answer :

Bayesian : Based on my knowledge, it’s 50/50, but it could be different for you.

Bayesian perspective is subjective. They determine the hypothesis based on previous experience — also known as the ‘prior.’ And they do not assume that one answer is going to be the same for everyone.

Frequentist : Hmm.. I don’t know yet. Give me that coin, let me try.

Frequentist approach requires using ‘current’ and limited data. Frequentist statisticians would try flipping the coin as many times as they can to determine the probability.

Statistics : Are you Bayesian or Frequentist?

Definition : Bayesian

In precise terms, the Bayesian approach begins by specifying a prior distribution over parameters that must be estimated. A parameter θ is considered as a random variable with a certain probability distribution, also known as the prior distribution. And this prior is calculated with the Bayes Theorem to create the posterior distribution that calculates the information from both the prior and the existing data.

Definition : Frequentist

The frequentist approach makes predictions on the underlying truths of the experiment using only data from the current experiment. In contrary to Bayesian approach, the parameter θ is unknown, and a fixed quantity. Since there is no random variable, different estimation approaches such as MLE, sandwich, and bootstrap methods are used to compute the probability density.

Applications in Regression

According to Bayesian and Frequentist Regression Methods (Springer, Jon Wakefield), the use of Bayesian and Frequentist approach in regression is often based on the size of the dataset.

Small Dataset : Bayesian approach with thoughtfully specified priors is recommended, as frequentist models can lead to skewed outcome with small size of data.
Medium to Large Dataset : Frequentist approach using sandwich estimation or quasi-likelihood is known to be robust, unless there is a strong prior information.
Highly Complex Dataset : Bayesian approach is preferred as formulating the model is straightforward and convenient compared to Frequentist approach.

However, in a world with a vast amount of data available to anyone, these boundaries are becoming ambiguous, and a more fundamental logic of how to approach different types of dataset seems to be on a debate.

Now, let’s take a look at how each of the approach is different in implementing a regression analysis.

- Bayesian Regression

As previously mentioned, Bayesian approach requires creating a posterior distribution from the prior and the given data. However, calculating the posterior distribution for continuous values can be computationally intractable. Therefore sampling methods such as Markov Chain Monte Carlo (MCMC) methods are used to draw samples from the given dataset.

Monte Carlo is a general technique of drawing random samples, and Markov Chain indicates that the next sample drawn is based only on the previous sample value.

The concept lies in drawing as many samples as possible so that the approximation of the posterior will converge to the true posterior distribution of the features in the dataset.

This can be implemented with a well known library for probabilistic programming and Bayesian Inference called PyMC3. The library includes setting up GLM (Generalized Linear Models) that helps create Bayesian Linear Model in Python.

Normal Distribution of the coefficient matrix (β), data matrix (X), and standard deviation (σ)

In addition, we can create the posterior distribution from relating the feature to the target and customize a prior distribution for the data. With few lines of code from the library, distributions are created from point values for the features in the dataset.

Bayesian Linear Regression in Python — Using ML to Predict Student Grades by Will Koehrsen

Note that entire range of values are available for the feature because Bayesian approach does not take certainty in the true values.

- Frequentist Regression

Frequentist Regression may sound more familiar to those who have just started learning Machine Learning. Parameters and hypotheses are viewed as unknown but fixed quantities. In other words, Frequentist approach simply experiments with the given data.

There are many methods available for estimation :

Maximum Likelihood Estimation — MLE
Quasi-likelihood
Sandwich Estimation
Bootstrap Method
OLS

And among the different methods above, OLS may be the simplest representation of a Frequentist approach, assuming we do not create any prior or posterior distribution.

And as can be seen in the image above, regression line with less samples (n=10, left graph) digresses more than the one with abundant samples (n=100, right graph). Frequentist method requires having sufficient data to determine the certainty and confidence of the model parameters.

Conclusion

When it comes to regression — Bayesian approach views parameter as a random variable, while Frequentist approach views parameters as a limited quantity. From a predictive point of view, there is no significant difference between both approaches. However, you can let yourself decide on which approach fits better for your dataset.

References

Bayesian and Frequentist Regression Methods (Springer) — Jon Wakefield
Statistics : Are you Bayesian or Frequentist?
Probability concepts explained: Maximum likelihood estimation
Bayesian Linear Regression in Python: Using Machine Learning to Predict Student Grades Part 2