How Bayesian Inference works in Data Analysis

Published in

Analytics Vidhya

5 min readMay 15, 2020

We can generate evidence for our prior beliefs by randomly experimenting

In the high school, you may have learned the Bayesian theorem, it states that if probability of an event (random experiment) depending on different other events (or here we say beliefs) is known than we can find the probability of one of our belief depending on the probability of the random experiment.

For example, a random experiment A depend on an event B (belief) then,

P(A) = P(A ∩ B) + P(A ∩ notB) , this gives

P(A) = P(A |B)P(B) + P(A |notB)P(notB)

when we know the dependent probabilities of A on B and notB than we can calculate the conditional probability of our belief i.e. P(B)

P(B|A) = 1 - P(A|notB)/ P(A)

In Bayesian inference, we call the probability of our previous beliefs as prior and the probability of our random experiment as posterior.

In real world scenario, the posterior probability is generated from the actual data which we want to analyze and the probability of our prior belief is calculated as a function of our posterior.

The probabilistic programming is purely an outcome of Bayesian inference which is used for prediction and confidence testing of our beliefs in data science. Probabilistic programming has an advantage over machine learning models that it shows risk associated with the prediction and how much confidence we can show on our model. Well basically it is not used for model generation or implemented in software , it is used to bring insights or inference by testing values of parameters over thousand times or even more.

I will illustrate you how the Bayesian inference work, with the help of two example. First one is a simple mathematical puzzle and the second one is an introductory example of probabilistic programming.

Assume we have two begs named A and B containing red and blue color balls , bag A has 5 red and 7 blue balls and beg B has 4 red and 8 blue balls . Now i draw a ball from one the bag( it is not known which bag) and it is found red lets denote this event by X .

Here, i have a believe that the ball is taken from bag A (the belief could be anything a gut feeling, experience or domain knowledge) lets denote the event of ball taken from bag A as ‘A’ and from bag B as ‘B’. Drawing the ball from both the bags is equally likely.

P(A)=P(B) =0.5 ,also

P(X|A) i.e ball being red when it is taken from bag A = 5/12

Similarly P(X|B) = 4/12

Now, suppose on various independent draws we found the P(X) i.e ball being red is 0.3 , this is our posterior and our prior is P(A|X) to find it we will use Bayes theorem,

P(A|X) =P(X|A) P(A)/(P(X|A)P(A) + P(X|B) P(B)). this gives

P(A|X) = (0.5 x 5 /12)/(0.5 x 5/12 + 0.5 x 4/12) = 0.55

So, this is the probability of our prior belief . According to these results we can update our belief. This is how the Bayesian inference works in shaping our belief . Now our updated belief is that, there is 55 % chances that the ball is taken from bag A if a red ball is drawn.

Another example is from the book “Bayesian Methods For Hackers” . It is use of bayesian methods in the Loss function minimization. Loss function is used to make our regression line best fit the data by finding the best parameters of regression line, for example if you trying to fit a straight line into your data, the equation of the line would be : Yhat = aX +b

X and Y are the variables denoting the coordinates of the data points,a and b are the line parameters, while Yhat (pronounced Y- hat)is the predicted variable using the linear regression model. If we use MSE (Mean Squared Error) as error metric, our loss function would look like,

so this our loss function , by replacing Yhat with aX +b , we try to find out the best values for ‘a’ and ‘b’. It is okay to use linear regression if our data is linear but if our data is non- linear or somewhat linear with noise , then linear regression won’t produce good results. Suppose our data looks like this,

This is our trading signal data (simulated) , clearly our linear regression line cannot fit the data properly. Here comes into the picture bayesian method, suppose we modify our line’s equation to Yhat = aX +b +c

here we introduce a new variable ‘c’ to better the data, c is called variable as its value is not fixed and changes with respect to X. The c variable accounts for the variation of data from linearity and make the line fit better into data.

Using the probabilistic programming, like the pyMC3 which is mainly used in the book “Bayesian Methods For Hackers” we can generate the samples for the variable c assuming it is normally distributed. We are not going in the details of probabilistic programming or pyMC, here i am just telling you that you can find the best value for our random variable ‘c’ using pyMC modelling. Once we observe the result of pyMC modelling we can plot the new regression line with the old one, which look like

Here we used the bayesian inference, assuming normal distribution of variable c, which turns much better than the linear regression alone.

Hope, you get to understand something about bayesian methods.

How Bayesian Inference works in Data Analysis

Written by Bharatsinghkushwah