Probability Vs Likelihood for Dummies

sarang manjrekar
Analytics Vidhya
Published in
4 min readOct 5, 2021

One of the most important yet difficult to understand differentiation for me in the data science journey has been “Probability vs Likelihood”. We will explore the difference between the two in this post through various view points.

1. Difference through the Definition Lense

Probability is used to finding the chance of occurrence of a particular situation, whereas Likelihood is used to generally maximizing the chances of a particular situation to occur.

𝐿(𝜃|𝑂)=𝑃(𝑂|𝜃)

Or

Likelihood describes the extent to which the sample provides support for any parameter value. Higher support corresponds to a higher value of likelihood.

The “parameter” being talked here is the paramter of the probability distribution assumed to be underlying the data.

2. Difference through the Statistical Lense

All the Probabilities of a distribution always sum upto 1.

∑ Pi = 1

While various likelihoods of an event might not necessarily add upto 1.

3. Difference in Use cases

Use case for Probability : A train yard has 1000 trains, 500 of them having 20 carriages, 300 having 15 carriages and 200 having 10 carriages. Whats the probability that train No. 1001 built by the yard will have more than 10 carriages.

Use case for Likelihood : I see a train passing by infront of me and happen to see the 4th carriage of the train. What is likelihood of train having 10 carriages in total.

================================

Probability is probably (Pun intended) a concept fairly understood. To get further insight into what “Likelihood” is, I would like to take a real world use case from the world of Cricket, one of my favorite games.

For one of my other projects, I had scrapped data from ESPNCricInfo Website recently. The dataset to be considered here are the runs scored by a batter in a Test Match.

Statistically speaking, sample space for the Runs scored by a batter is the set of Whole Numbers. {1,2,3,….,100,…,400,…..}. But scoring a century (100 or more) is a rare event in a game of cricket and hence for the sake of simplicity I will only be considering the scores ranging from 0 to 100.

Lets have a look at the data distribution densities generated with these datasets for last 4 decades. Assumption is each decade has its own distribution of Runs scored by a batsman.

Distribution densities for Scores of batsman in Test Matches played during the decade of 1980–1990

1980–1990

Distribution densities for Scores of batsman in Test Matches played during the decade of 1990–2000

1990–2000

As evident from the images, the underlying distribution is Exponentially decreasing function with some paramter Lambda such that :

f(x) = λ.e^(-λx)

Distribution densities for Scores of batsman in Test Matches played during the decade of 2000–2010

2000–2010

Problem Statement : Find the likelihood of a Batter getting dismissed for a score of 00 Runs in a game.

Duck : Getting out without scoring runs in a Cricket match

In this case the observed data is [00]. In a more practical use case data set observed will be a set of observations rather than just one data point. What exactly needs to be done is “find the parameter λ” which maximizes the support for observed data.

Now we have 3 different lambdas with each representing a different dataset distribution.

Now Since 𝐿(𝜃|𝑂)=𝑃(𝑂|𝜃), Likelihood for the observed event is represented as below :

𝐿(𝜃|𝑂)=𝑃(𝑂|𝜃)

Determining the maximum likelihood is easy here (L ( λ1990 : 0) = 0.1406) with just one observed data point in our use Case. Maximum Likelihood estimation ( MLE ) is the method used in general to find the optimal values of parameters ( λ in this case).

Thanks for reading !

Please do share feedback and any concerns in the comments Section.

For Data visualization enthusiasts and Cricket buffs, here are my blogs from Data analysis on ESPNCricInfo Dataset.

--

--