This is the 5th post of blog post series ‘Probability & Statistics for Data Science’, this post covers these topics related to Bayesian statistics and their significance in data science.
- Frequentist Vs Bayesian Statistics
- Bayesian Inference
- Test for Significance
- Significance in Data Science
Visit ankitrathi.com now to:
— to read my blog posts on various topics of AI/ML
— to keep a tab on latest & relevant news/articles daily from AI/ML world
— to refer free & useful AI/ML resources
— to buy my books on discounted price
— to know more about me and what I am up to these days
Frequentist Vs Bayesian Statistics
Frequentist Statistics tests whether an event (hypothesis) occurs or not. It calculates the probability of an event in the long run of the experiment. A very common flaw found in frequentist approach i.e. dependence of the result of an experiment on the number of times the experiment is repeated.
Frequentist statistics suffered some great flaws in its design and interpretation which posed a serious concern in all real life problems:
- p-value & Confidence Interval (C.I) depend heavily on the sample size.
- Confidence Intervals (C.I) are not probability distributions
Bayesian statistics is a mathematical procedure that applies probabilities to statistical problems. It provides people the tools to update their beliefs in the evidence of new data.
Frequentist vs. Bayesian Inference - The Basics of Bayesian Statistics | Coursera
This course describes Bayesian statistics, in which one's inferences about parameters or hypotheses are updated as…
To understand Bayesian Inference, you need to understand Conditional Probability & Bayes Theorem, if you want to review these concepts, please refer my earlier post in this series.
Probability for Data Science
This is the 2nd post of blog post series ‘Probability & Statistics for Data Science’, this post covers these topics…
Bayesian inference is a method of statistical inference in which Bayes’ theorem is used to update the probability for a hypothesis as more evidence or information becomes available.
An important part of Bayesian Inference is the establishment of parameters and models. Models are the mathematical formulation of the observed events. Parameters are the factors in the models affecting the observed data. To define our model correctly , we need two mathematical models before hand. One to represent the likelihood function and the other for representing the distribution of prior beliefs . The product of these two gives the posterior belief distribution.
A likelihood function is a function of the parameters of a statistical model, given specific observed data. Probability describes the plausibility of a random outcome, without reference to any observed data while Likelihood describes the plausibility of a model parameter value, given specific observed data.
Likelihood function - Wikipedia
In Bayesian inference, although one can speak about the likelihood of any proposition or random variable given another…
Prior & Posterior Belief distribution
Prior Belief distribution is used to represent our strengths on beliefs about the parameters based on the previous experience. Posterior Belief distribution is derived from multiplication of likelihood function & Prior Belief distribution.
As we collect more data, our posterior belief move towards prior belief from likelihood:
When I tell people I am learning Bayesian statistics, I tend to get one of two responses: either people look at me…
Test for Significance
Bayes factor is the equivalent of p-value in the Bayesian framework. The null hypothesis in Bayesian framework assumes ∞ probability distribution only at a particular value of a parameter (say θ=0.5) and a zero probability else where. The alternative hypothesis is that all values of θ are possible, hence a flat curve representing the distribution.
Using Bayes Factor instead of p-values is more beneficial in many cases since they are independent of intentions and sample size.
Replacing p-values with Bayes-Factors: A Miracle Cure for the Replicability Crisis in Psychological…
How Science Should Work Lay people, undergraduate students, and textbook authors have a simple model of science…
High Density Interval (HDI)
High Density Interval (HDI) or Credibility Interval is equivalent to Confidence Interval (CI) in Bayesian framework. HDI is formed from the posterior distribution after observing the new data.
Using High Density Interval (HDI) instead of Confidence Interval (CI) is more beneficial since they are independent of intentions and sample size.
Confidence vs. Credibility Intervals
Tomorrow, for the final lecture of the Mathematical Statistics course, I will try to illustrate - using Monte Carlo…
Moreover, there is a nice article published on AnalyticsVidhya on this which elaborate on these concepts with examples:
Bayesian Statistics explained to Beginners in Simple English
Introduction Bayesian Statistics continues to remain incomprehensible in the ignited minds of many analysts. Being…
Significance in Data Science
Bayesian statistics encompasses a specific class of models that could be used for Data Science. Typically, one draws on Bayesian models for one or more of a variety of reasons, such as:
- having relatively few data points
- having strong prior intuitions
- having high levels of uncertainty
And there are scenarios where Bayesian statistics will perform drastically, please read following discussion for details:
What's the relationship between bayesian statistics and machine learning?
Answer (1 of 2): Machine learning is a broad field that uses statistical models and algorithms to automatically learn…