Probability for Machine Learning #2 (basics part 2)
Introduction:
As told before, every Sunday a new topic related to probability later also about other ML and data science related topics would also be coming up. This is the second part of the basic of the probability where we will cover mainly the conditional independence and the Bayes rule with a really cool problem at the end.
Disclaimer
I really assume that readers here know the basics of probability and also some other basics concept like conditional probability or independence etc. If not then you are requested to see my previous tutorial about the basics of probability (part 1), Without any further due, lets get stated and it will be really fun.
Contents
→ Conditional Probability (extended)
→ Law of total Probability
→ Bayes rule and theorem
→ Solution of a sample problem based on Bayes theorem
→ Appendix pdf
→ References
Conditional Probability (extended)
In the previous tutorial I have explained the what is the thing called conditional probability and how it is implemented. And for a sample space containing two events {A₁, A₂}.
Now if we talk about a sample space containing three events ,as we all in genera for n- sets of events , we can see this following results . NOTE the proof is provided in the appendix pdf.
So for general case of a sample space containing n events the formula becomes somewhat like this:
If you want to know form where this equation is coming then refer to the appendix pdf.
Law of Total Probability
So far we have seen the conditional probability , as well as the unconditional probabilities. The law of Total probability is very simple, We know there are several factors in the atmosphere that can causes a rainfall and so computing whether rainfall will be there in a particular day or not is depended on several factors and where there is dependency there are intersections and conditional probabilities. But though there are several factors but if we actually want to know that what is the chance for rainfall for a particular day, then there comes the concept of the Law of Total Probability.
For eg. there are n-sets of depended events where let any arbitrary event is Aᵢ and the main event of interest is S (let).
So the Law of Total Probability basically allows us to pool all the conditional probabilities together i.e. P(S|Aᵢ) , weighting them with individuals i.e.
P(A)×P(S|Aᵢ) to compute the total probability of interest.
The mathematical representation is shown below
Bayes Theorem
The basic Bayes rule is very simple i.e. that we know that
P(A|B) ≠ P(B|A) , but we can convert P(A|B) to P(B|A), and the basic method used in this conversion is the Bayes rule, below the picture shows the Bayes rule:
So what is the general Bayes theorem?
Let us assume; it is told that an event X can occur if one of the mutually exclusive and exhaustive sets of events {A₁ , A₂, …. , Aₙ} occurs.
Also let us assume that the un-conditional probabilities :
P(A₁), P(A₂),…,P(Aₙ)
and the conditional probabilities P(X|A₁) , P(X|A₂) , …. , P(X|Aₙ) occurs, then:
Now lets solve one problem in order to understand the Bayes theorem in more details:
Assume there are three urns A, B, C such that urn A contains
{6 red, 4 white} balls, urn B contains {2 red, 6 white} balls and let urn C contains {1 red, 5 white } balls. So what is the probability that the chosen urn is A when a red ball is chosen at random?
Solution of this problem is based on this above theorem, as shown below:
So up to here we work out the probability of choosing any urn at random, now we want to work out the different conditional probabilities possible w.r.t. the red balls and that is shown below:
So now all the ingredients are ready, its time to make the main recipe, i.e. the finding P(A|X) with the help of Bayes theorem . The final steps to work out this problem is shown below:
And that's it, congratulations 🎉🎉🎉🥳😀 for todays micro contents in probability. So after this now you can compute:
→ The conditional probability
→ Law of total Probability
→ Bayes theorem
→ Applying Bayes theorem
I will upload a small article containing a problem that will go through all the concepts which are discussed here till here, shortly…
Appendix (all the pics with proofs are here)
Please get through this link in order to see the proofs and get the notes at ones as a cheat sheets…😁
References:
→ Mathematics S.N dey
→ Probability stats for data science (PDF)