Why the world needs a Bayesian perspective?

Published in

NeuralSpace

15 min readFeb 15, 2020

This post is a part of the series Bayesian Neural Networks (Check Post1 and Post 2) that covers some history of Bayesian-ism and why we need to see the world through a Bayesian perspective.

We take for granted that the cases are equally possible, that is to say, that each case can occur as easily as any other. — Jakob Bernoulli, Ars Conjectandi 1713

Do certainties exist?

On a fine Friday evening, a group of four friends were enjoying a fine game of poker. In no time, the game gets serious. Some fine amount of money was on the table. A fifth friend who was not part of the playing four members but was a connoisseur of the game was making predictions about the winning hand.

Soon another friend of the group joined the game. Since he had no idea of the game, he decided to watch the game. He hoped the game would be over by the time he will join and so all of them can have a good time together. But the intensity of the game was so high, that everyone decided to continue the game for some more rounds. The new friend sat with the expert of the game (fifth friend) and he decided to make predictions about the winning hand for every round.

Like any new member in the room would have done, the new friend counted the number of players in the room. He observed the equal distribution of cards and the way the game is played. Based on the observations, he predicted the winning chance of every person to be 1 in 4 i.e. 25%. A fair observation indeed. However, the expert in the game was moderating the game for some time. He had some prior knowledge of the game. He decided to use his prior knowledge before making a prediction on the winner. Based on his prior knowledge, his bets were close to the winner for that round. However, poker is a highly uncertain and complex game and it is very difficult to make a correct prediction on the winner based on just some observations in the past.

What was the real difference between the two predictions made? The new friend was observing the state of the game and based on that he made his decision. There are 4 players in the game and every one of the four players has an equal chance of winning the game. This is similar to tossing a coin and predicting the chance of getting a head or a tail. The chance is equal for both sides (Head or Tail) with a probability of 1 out of 2 or 50 per cent. But what if the coin is tossed 10 times and every time head appears as the results. Will the next prediction still 50–50 or will it change?

Is everything a probability?

The event of tossing two coins one after the other are independent of each other, meaning the results of two coin tosses do not depend on each other. In that case, the probability distribution should be 1 out of 2 again or a 50% chance of getting a head or a tail. But isn’t the previous pattern has to play some role here.

1: Let’s start with the belief that everything in the world can be represented in the form of probabilities. This belief is not certain and can be argued upon. However, let us analyze this belief: There are things that are certain in this world and there are lots of uncertainty attached to many things. Things like if it is sunny outside and it actually is sunny is almost certain at the moment. It can be assumed that it is happening with a very high probability (close to 1). Again this can be argued upon but let’s take this into account.

Statistics is the ‘Science of Uncertainty’ -Noel Cressie and Christopher K. Wikle, Statistics for Spatio-Temporal Data, Wiley, 2011, p. 4

Back to the coin toss case. We know the probability of an event happening (Head or Tail in this case) is equally likely (based on the assumption of a fair coin). It can be argued that the next outcome will be equally likely for both the outcomes. An equal probability of 50 per cent can be associated with each case and in doing so, we are seeing the world with CLASSICAL FRAMEWORK.

Under a classical framework, outcomes that are equally likely to happen have equal probabilities.

So, in our case, getting a Head is equally likely as getting a Tail and hence, the probability will be one-half for each case.

NOTE: A high probability expresses a high degree of certainty about the occurrence of an event. However, certainty is still a term under discussion and 100% probability does not mean the certainty of an event unless observed. It is possible to put a 100 per cent belief in an event whose occurrence cannot be assumed to be certain unless observed by the believer.

2: Let’s start by thinking about how do we know that the probability of an outcome is one-half for a coin toss. We know that there are two possible outcomes and both are equally likely. But we have seen in the previous 10 tosses that every time the outcome is a Head. How do we know that a coin is fair and not biased towards Head outcome? We can do it by tossing it again for 10 more times. Are 20 tosses enough to determine the dice is fair? Maybe not. Let’s toss it for 10 more times and 10 more times and 10 more times and so on. How do we know when we should stop? We can define a stopping point ourselves and see if the probability of both outcomes are equally likely or not. However, if we toss it for an infinite number of times, there is a high chance that the outcome will become equally likely for both the outcomes if the coin is unbiased. This is how a FREQUENTIST will see the world.

A frequentist framework assumes to have a hypothetical infinite sequence of events and then based on the relevant frequency, in that hypothetical infinite sequence, makes a decision.

If assuming the coin is biased which can be checked based on tossing the coin for an infinite number of times and recording the outcome, the outcome for the 11th coin toss will be one-half again. But it relies on the belief that the coin is fair and unbiased. But how can we check if the coin is fair or biased?

Well, the answer is simple. Toss the coin again and again for an infinite number of times to determine whether the coin is fair or not. Well tossing a coin for n number of times cannot tell us if the coin is completely unbiased or not. There might be many factors that may be responsible for the unequal outcome: like tossing the coin in a certain way. But assume that there are no external factors determining the outcome and it completely dependent on the coin toss, then we can say that the coin is biased or fair based on the outcome of the toss. But all we would then say is coin is fair or biased or simply 0 or 1. So under the frequentist paradigm, this probability is either a 0 or 1, which does not completely make sense. Let’s dig in a bit deeper using the sunny day example that we used before to determine whether an event occurrence is certain or not. How about we want to know if tomorrow will be sunny or not? Now from a classical point of view, we can say that tomorrow will be sunny with a chance of one-third (assuming the days can be sunny, rainy or cloudy). But from a frequentist point of view, we have to think for all the possible hypothetical sequences of tomorrow with the probability of being sunny which seems a bit tough. So what we should do?

3: Let’s assume that there is a trick that is being used to get the desired outcome in our coin toss experiment. Once the person who was tossing the coin earlier was replaced, the results for the next 10 tosses were 6 heads and 4 tails which looks more believable for the coin toss. But we are the only ones who know about the trick here. When we brought the same person back into the picture to toss the coin, we have prior information that might not be known to many people. But we can use the prior information that we have to determine the possible outcome for the next coin toss. We know that if the same trick is going to be used again, there is a high chance of getting a Head. So we put our bet more in favour of Head compared to Tails. This way of seeing the world is through BAYESIAN perspective.

We have prior knowledge about the event which we used alongside with the information to make a prediction of the outcome. It looks more like a real-life way of doing things. However, we are the ones who knew about the trick and another person who might be watching the coin toss might not know about it. So his prior information will be different than ours and hence his prediction will be different than ours.

The ‘Dutch Book’ Argument

A different dutch book (a movie) but with similar motive

So how can a person use the Bayesian perspective if it subjective to one’s belief? There is a way to deal with it. Let’s get back to the coin toss example again. This time before tossing a coin, let’s put a moderator there who will have two jobs:

He will make sure that the coin toss results are what it is being claimed for. So we reduce the chances of interference with the outcome.

2. He will allow only those people to make a prediction who are willing to bet in favour or against of getting a Head.

But there is one condition: For every bet, the winning amount would be 1 Euro lesser than the loss amount. Meaning, if you put a 5 Euros bet in favour, if you lost the bet, you have to pay 6 Euros. This is tricky and risky. Now person A knows the secret of the person tossing the coin as he knows if the coin tosser will use a trick, the chance of getting a Head would be high. He will use his prior knowledge to put bets on when he sees the trick is being used. On the other hand, person B is unaware of the tricks. He puts bets based on his knowledge about the coin toss i.e: the outcome of a coin toss is either Head or Tail with equal probability of one-half. This shows that a Bayesian perspective is very subjective as the prior information can be subjective.

Suppose player B plays the game for 10 times and we assume that the no trick was involved. What do you think the outcome will be at the end? Will person B takes home some winning money or did he lost some? No matter what outcome Player B puts his bet on, he is going to lose an amount at the end of 10 games. The reason is his bets are not coherent.

Coherence refers to the phenomenon where standard rules of probability are followed.

In our case, the bets are not coherent. Let’s calculate it: Winning here is getting the desired outcome (the one on which we put our bet on). Suppose we put our money on Head as the desired outcome. Probability of winning here is ½. If we win, we will win 5 Euros and if we lose we will lose 6 Euros. Overall win/loss for our case is defined by the overall probability P(E) score as:

This means on average player B will lose 50 cents on every game. This is not a desirable bet to place the money on. This phenomenon where someone constructs a series of bets where losing money is inevitable is known as ‘Dutch book’. A popular term used in gambling as well as in statistics.

The phenomenon of Dutch Book can be avoided however by following the coherent rules of probability and Bayesian statistics.

Now let’s get back to the poker game. We have four players playing a fair game of poker. Poker is, however, a highly unpredictable game. Two friends, let’s name them Adam and Bob are watching the game closely and are trying to put their bets on who will win the given round. Adam is an expert of the game and has been watching the game closely for some rounds. Bob was outside and just joined the game and has no prior knowledge about the winning person of previous rounds.

Subjective Probability

Now let’s assist Bob and Adam in their prediction based on the three frameworks as Classical, Frequentist and Bayesian respectively that we saw just now:

If Bob follows the classical approach, this means his prediction is based on the concept that the outcomes that are equally likely to happen will have equal probabilities. Since probabilities sum up to 1, this means every person winning chance is equally and it is hard to put a bet on any person.
If Bob follows the frequentist framework, it is hard to run infinite simulations of who is going to win in normal circumstances and the game of poker involves a lot of luck and mental state of the players. This is an example of a single-case probability sample that we have and we have to make a prediction based on that. This is where the frequentist framework has a hard time to analyze the states and to make a decision.
The more ideal way for the analysis would be to use the Bayesian framework. Now, a Bayesian Bob would say that he has a certain degree of belief that player A is going to win the game. Now, from where does this belief comes from and what does this belief indicate? Let’s start by saying a statement that “I believe that there is a 10 per cent chance that it is going to rain tomorrow”. Now what I am saying here: I am implying that I put a 10 per cent confidence in my belief that the event (raining tomorrow) is going to happen. Now, this degree of belief is very subjective and we will see soon that it is also dependent on the subjective prior knowledge of the event. So, let’s take a second to define probability in terms of Bayesian perspective or we can call it subjective probability.

Subjective probability is the confidence or the degree of belief that someone puts into the happening of an event or in simple words a degree of belief that something occurs.

Now, what about Adam. Let’s take all the cases into account for Adam:

From a classical framework perspective, there is no change in Adam’s prediction. It will be the same as Bob. All four players are equally likely to win here and hence he will either choose randomly one of those or if he doesn’t want to take a risk, will not put a bet.
From a frequentist point of view, there is a difference now. He has seen some finite number of games being played and he has outcomes for a finite number of events. Now one way to predict is to count the total number of games that he witnesses and see who won more games and put a bet on that person. However, a frequentist person would not be sure here if the number of observations is considerate enough to put a bet. Any event cannot be performed an infinite number of times. There is a finite number where the experiments need to stop. It can be decided pre-hand. But certainly, the number of observations here are not enough to favour one over the other. So it is hard to say if a bet should be placed by Adam or not.
Let’s take the Bayesian perspective here. Now based on the game observations and the results, there is a prior belief that one person (considering he is using some techniques like bluffing) is winning most of the hands. Now the prior is very subjective as discussed above. Adam is free to choose his own prior knowledge before he makes a prediction. Now there are two scenarios: First, all players are equally likely to win. There is no guarantee that a certain trick that made a player win a majority of the hands is being applied in this round. So either player can win with a 25 per cent chance. Secondly, we know that a player is playing well and we should put a bet in favour of that player. Adam believes that there is a 50 per cent chance that the favourite player will win this round.

Now there is a subjective probability involved here. Now, it is up to someone to decide what bets he is ready to put in favour of the occurrence of certain events. Example: If Adam can predict the winner of a certain round and if he is right, he will get 100 Euros and if he is wrong he will not lose any money. In this case, he will go with the player with the most wins in the past. On contrary to that, if for wrong prediction he will lose 100 Euros, then he might not want to put his bet on any player.

Probabilities do not exist!

Famous lines by Bruno de Finetti (Source)

The famous line by a subjective Bayesian believer Bruno de Finetti: ‘Probabilities do not exist’. With this statement, de Finetti believes that objective probabilities do not exist. Meaning, the probability is very subjective and it rather relies on the bet that a person is willing to take on the occurrence of an event. De Finetti introduces the concept of Prevision, or probability via expectation.

The prevision of a random variable X, “according to your opinion, is the value x’ which You would choose” if “You are committed to accepting any bet whatsoever with gain C (X-x’) where C is arbitrary (positive or negative) at the choice of an opponent”. -Bruno de Finetti, Theory of probability (p 87)

The dilemma

If Bayesian probability is so subjective and if it is dependent on subjective beliefs, are we good at reasoning our beliefs? Is everyone who thinks in Bayesian way is equally good? How do we know that one belief is better than the other? How do we make sure that we follow the rules of probability here?

How to reason someone’s belief? Remember we made a statement that I believe there is a 10 per cent chance that it will rain tomorrow. The statement 10 per cent chance can be written in the form of a probability of an event happening. In other words, it is the same as writing ‘there is a 10 per cent chance that it will rain tomorrow’.

Basically we have associated a number to the degree of belief. This work was done by Frank Ramsey and Bruno de Finetti in early 1900s.

We have assigned numerical values to the beliefs. But how do we make sure that the numbers make sense and it follows the rules of probability? Again we answered this before. The concept of Dutch book is applied here. No one will put his bets on something where he knows he will lose the bet for sure. This will ensure that the bets are coherent to each other meaning we already follow the rules of probability.

The term ‘Dutch book’ was coined by Frank Ramsey.

Finally, as in the game of poker as discussed above, there was prior information with Adam. He was making predictions based on his prior knowledge. Now, this prior knowledge is getting updated as the game progresses. The prior plays an important role in his predictions.

I will write about priors in a different post.

Summary

If a die is rolled once, a classicist will see all six outcomes to be equally likely and will predict each outcome with equal probability of one-sixth. A frequentist will analyze all the empirical evidence of the past which might have landed equally on either side and will base his prediction on that. A Bayesian subjectivity might agree to his beliefs which could state the equal likelihood of all sides or might base his belief on some different priors or on other information and will try to find more concrete information on which he should place his bets on.
De Finetti introduced the idea of previsions and these previsions should be coherent in order to avoid an inevitable loss. This phenomenon is known as a Dutch book that ensures a sure loss at the hands of a clever opponent.
Bayesian inference uses Bayes theorem to update the probability of a hypothesis as more evidence is available. Posterior is defined as prior times likelihood over evidence. The prior is subjective and may differ for two people observing the same event.

If you are interested in creating your own Bayesian Neural Networks in PyTorch, check out the code here.

If you have any comments/suggestion, find me here.