“When you have excluded the impossible, whatever remains, however improbable must be the truth”

Maya Toteva
Human Systems Data
Published in
5 min readApr 12, 2017

-Sherlock Holmes (Sir Arthur Conan Doyle)

I borrowed this quote from Brandon Rohrer, who used it to describe the way Bayesian inference work in his short, but very informative video, which I recommend to all of you.

Even though I have heard a lot about Bayesian method for statistical analysis, I was reluctant to find more about it because it seemed intimidating. Since the method is typically presented as a mathematical tool for predicting the probability that an event will occur, it clearly sent a message to steer clear from it.

For this assignment, I had to give the Bayesian method another chance to win me over from the dark side of the traditional statistics. I must admit that if I relied solely on Kruschke’s article to make me embrace the Bayesian methods, that would have never happened. However informative and detailed his comparison between the Frequentism and Bayesiansm is, it failed to make the connection between theory and practical application. After I finish reading the article I still had a lot of questions, but I was aware of the bigger picture. I guess, to be able to better understand the Bayesian method, one must first have extensive knowledge of the inferential statistical methods. Clearly, I had reached the necessary academic level for proper reintroduction to Bayesian methods for data analysis.

What becomes clear from Kruschke’s article is that when caring out statistical inference, there are two approaches — Bayesian and Frequentist. One provides estimates to express confidence, the other refines the original believes as new evidence is emerges. One uses conditional distributions within stated perimeters (hypothesis), the other uses probability distributions. Frequentist rely heavily of p-values, F-statistics, and t-statistics, which, as we have already discussed, are dependent on the sample size. In other words, using a fluctuating, easily manipulated value to attests statistical significance. In contrast, the Bayesian analysis is using descriptive model, easily adapted to different situations (Kruschke, 2010). The model consists of prior belief, data, and posterior belief. To explain this model in Layman’s terms I will use an example from Hals-Moor (2014). He suggests as prior belief the notion that the Moon is going to collide with the Earth. The more nights have passed, the more evidence is collected to correct the prior belief applying Bayesian inference, supporting a posterior belief that such occurrence is highly unlikely. I understand that!

But I was only able to wrap my head around the Bayesian inference after I watch a video introduction by Brandon Rohrer, retrieved from R-Bloggers website. He gave an excellent example with a movie theater. A person with a long hair drops a ticket in the hallway. The question is how do you address this person — as “ma’am” or “sir”? In situations like this it is expected that we will have to make a guess. But what if the person was standing in line for the men’s restroom? This additional piece of information makes the guess less speculative. It helps us use common sense to make inference. We make more accurate guesses if we use what we already know about the situation.

Bayes approaches this problem in a very logical way. He assumes that out of 100 women in the theatre, 50 have long hair and 50 have short hair. Knowing that more women than men have long hair, he also assumes that out of 100 men, only 4 will have long hair and 96 will have short hair. In this case, it is safe to assume that the person who lost the ticket is probably a woman. But when we add to the equation the additional piece of information that the person in question is waiting in line for the men’s restroom, the previous assumption is no longer valid. In this scenario, different set of assumptions apply. If a 100 people are waiting in line for the men’s restroom, 98 will be man, and only 2 are going to be women. Following the previous logic that half of the women will have short and half will have long hair, it means that one of the ladies has short and the other one has long hair. For the man group the assumptions are different. The prior assumption for man was that out of 100 96 will have short hair and 4 will have long. In this case, we have four times more man with long hair than we have women. It is a safe to infer that the person who dropped the ticket is a man.

Mathematically presented, the scenario looks like that:

P (event) = # of something/ # everything

P (w) = # women/# people = 50/100 = .5

P (m) = #man/# people = 50/100 = .5

Now the scenario with the men’s restroom:

P (w) = 2/100 = .2

P (m) = 98/100 = .98

Now it is the time to introduce the Bayes theorem. It is based on three probability concepts, the first of which is conditional probabilities. If we know that a particular person is a woman, what are the probabilities that that person has a long hair? If I know that B is true (long hair), what is the probability that A (woman) is true too? A and B are not interchangeable. The second concept is joint probability. What is the probability that a person is both man and has long hair? Third concept is marginal probabilities, or what is the probability of someone having a long hair in general.

Combined, these three concepts represent Bayes Theorem:

P (W/M) =( P (m/w) P (w))/P(m)

This easy explanation of Bayesian methods for data analysis made the difference for me. All we need sometimes to understand a concept is to put it in a real-world scenario.

In the concluding paragraph of his article, Kruschke makes the case for Bayesian statistics. He argues that traditional statistics restricts scientist to the descriptive models established by convention. Bayesian analysis, on the contrary, does not tell us how to think, what models to consider and how to apply the probability concepts. It only tells us what values are credible if we wish to consider them. Bayesian analysis opens the possibility to explore probabilities without the pressure of achieving statistical significance, or rejecting the Null hypothesis. Just pure scientific exploration, wherever it takes us.

Kruschke, J. K. (2010). What to believe: Bayesian methods for data analysis. Trends in cognitive sciences, 14(7), 293–300

https://www.quantstart.com/articles/Bayesian-Statistics-A-Beginners-Guide

https://www.r-bloggers.com/the-basics-of-bayesian-statistics/

--

--