Bayesian Statistics — Explained in simple terms with examples

Shashank Parameswaran
5 min readSep 16, 2020

--

Bayesian statistics, Bayes theorem, Frequentist statistics

This article intends to help understand Bayesian statistics in layman terms and how it is different from other approaches.

Life is full of uncertainties. Ask yourself, what is the probability that you would go to work tomorrow? What is the probability that it would rain this week? Will I contract the coronavirus? As you read through these questions, on the back of your mind, you have already applied some Bayesian statistics to draw some conjecture. Bayesian statistics help us with using past observations/experiences to better reason the likelihood of a future event. The term “Bayesian” comes from the prevalent usage of Bayes’ theorem, which was named after the Reverend Thomas Bayes, an 18th-century Presbyterian minister.

image1

“Bayesian methods better correspond to what non-statisticians expect to see.”

“Customers want to know P (Variation A > Variation B), not P(x > Δe | null hypothesis) ”

“Experimenters want to know that results are right. They want to know how likely a variant’s results are to be best overall. And they want to know the magnitude of the results. P-values and hypothesis tests don’t actually tell you those things!”

Let’s try to understand Bayesian Statistics with an example.

Let’s assume you live in a big city and are shopping, and you momentarily see a very famous person. Let’s call him X.

Now you come back home wondering if the person you saw was really X.

Let’s say you want to assign a probability to this.

Since you live in a big city, you would think that coming across this person would have a very low probability and you assign it as 0.004. Bayesian Statistics partly involves using your prior beliefs, also called as priors, to make assumptions on everyday problems.

Mathematically, we can write this as:

P (seeing person X | personal experience) = 0.004

The next day, since you are following this person X in social media, you come across her post with her posing right in front of the same store. You are now almost convinced that you saw the same person. You assign a probability of seeing this person as 0.85.

Mathematically, we can write this as:

P (seeing person X | personal experience, social media post) = 0.85

You want to be convinced that you saw this person. So, you start looking for other outlets of the same shop. You find 3 other outlets in the city. Now, you are less convinced that you saw this person. You update the probability as 0.36.

Mathematically, we can write this as:

P (seeing person X | personal experience, social media post, outlet search) = 0.36

Bayesian Statistics is about using your prior beliefs, also called as priors, to make assumptions on everyday problems and continuously updating these beliefs with the data that you gather through experience. You change your reasoning about an event using the extra data that you gather which is also called the posterior probability. The posterior belief can act as prior belief when you have newer data and this allows us to continually adjust your beliefs/estimations.

The Bayes theorem formulates this concept:

Let’s take another small example.

Let’s say you want to predict the bias present in a 6 faced die that is not fair.

One way to do this would be to toss the die n times and find the probability of each face. This is commonly called as the frequentist approach.

Another way is to look at the surface of the die to understand how the probability could be distributed. Say, you find a curved surface on one edge and a flat surface on the other edge, then you could give more probability to the faces near the flat edges as the die is more likely to stop rolling at those edges. This is the Bayesian approach.

image3

Advantages of the Bayesian approach:

  • It excels at combining information from different sources
  • Bayesian methods make your assumptions very explicit
  • It provides a natural and principled way of combining prior information with data, within a solid decision theoretical framework. You can incorporate past information about a parameter and form a prior distribution for future analysis. All inferences logically follow from Bayes’ theorem.
  • It provides interpretable answers, such as “the true parameter Y has a probability of 0.95 of falling in a 95% credible interval.”
  • Recent developments in Markov chain Monte Carlo (MCMC) methodology facilitate the implementation of Bayesian analyses of complex data sets containing missing observations and multidimensional outcomes.

Disadvantages:

  • It does not tell you how to select a prior. There is no correct way to choose a prior. Bayesian inferences require skills to translate subjective prior beliefs into a mathematically formulated prior. If you do not proceed with caution, you can generate misleading results.
  • It can produce results that are heavily influenced by the priors. From a practical point of view, it might sometimes be difficult to convince subject matter experts who do not agree with the validity of the chosen prior.
  • It often comes with a high computational cost, especially in models with a large number of parameters

Conclusion:

Most problems can be solved using both approaches. Frequentist statistics tries to eliminate uncertainty by providing estimates and confidence intervals. Bayesian statistics tries to preserve and refine uncertainty by adjusting individual beliefs in light of new evidence. The Bayesian approach can be especially used when there are limited data points for an event. A mix of both Bayesian and frequentist reasoning is the new era.

References:

Kurt, W. (2019). Bayesian Statistics The Fun Way. No Starch Press.

https://www.quantstart.com

https://documentation.sas.com

http://blog.analytics-toolkit.com

--

--