Probability vs. Likelihood : The Ultimate Explanation You’ll Ever Need

Manoj Kumar
4 min readMay 2, 2024

Let’s begin by exploring a common question: What is the probability of a house being priced between $600K and $620K?

and you have been given a dataset of house prices like this :

To address this question, we need to remember that house prices are continuous random variables. This means we should use a probability density function (PDF) to describe them.

To determine the probability that a house falls within a specific price range, you analyze the area under the PDF curve corresponding to that range. For this example, assume the dataset indicates that house prices follow a normal distribution. The PDF of this distribution, where the variable Y represents house prices, can be expressed as follows:

But wait …knowing that house prices follow a normal distribution isn’t sufficient to calculate specific probabilities unless we also know the parameters of the distribution, specifically the mean (μ) and the standard deviation (σ). The shape and spread of the probability density function (PDF) depend heavily on these values. Let’s consider two scenarios to illustrate this:

Case 1: Known Parameters (μ,σ)
If the values of μ and 𝜎 are known, calculating the probability of house prices falling between $600K and $620K is straightforward. We simply integrate the PDF over this range.

Case 2: Unknown Parameters (μ, σ)
If we don’t know 𝜇 and σ, we cannot directly calculate the probability because the shape of the distribution remains undefined. First, we need to estimate these parameters based on our data. This involves considering various (μ, σ) combinations and determining which best describes our dataset.

Let’s plot three different normal distributions for house price data using varied parameters (𝜇, σ).

We aim to determine which distribution best fits the actual house price data. To achieve this, we define a function 𝐿 that takes two parameters (μ, 𝜎) and returns a score. This score quantifies how well the distribution corresponding to these parameter values fits the data.

Consider the following parameters and their respective scores from function 𝐿:

  1. L(μ=300,σ=50)=2.6
  2. L(μ=500,σ=100)=6.2
  3. L(μ=700,σ=150)=5.3

From these scores, it is clear that the function 𝐿 gives the highest value for the distribution with parameters 𝜇=500 and σ=100, indicating that this distribution provides the best fit among the three.

This function L is likelihood function

Defining the Likelihood Function
The likelihood function 𝐿 is proportional to the probability of observing our data given specific parameter values.

But wait wait ..what do we want ,we want a best distribution possible over all possible values of (μ , σ) so that we can write as

Please try to understand that we’re not looking for the probability of the parameters given the data(y1,y2…) but rather the best parameters given the data.

In our above picture, the parameters (𝜇,σ) are denoted by a semicolon (;) rather than a vertical bar (|). This notation emphasizes that the values of 𝜇 and σ are not pre-determined but are instead the parameters we seek to estimate. Using the semicolon in this context signifies that we are exploring the probability of observing the data (𝑦1,𝑦2,…) as parameterized by (𝜇,𝜎)

Assuming that the house prices 𝑦1,𝑦2,…are independent implies that the price of one house does not influence another. Under this assumption, we can express the joint probability of observing these specific prices, parameterized by 𝜇 and 𝜎, as the product of individual probabilities. Mathematically, this relationship can be written as:

Now, we want to maximize the likelihood function 𝐿, which corresponds to maximizing the product term ∏. This is because the likelihood is proportional to this product term.

I hope this explanation has helped clarify the distinction between probability and likelihood. It’s important to understand that likelihood is proportional to probability, but it is not itself a probability. This distinction is crucial in statistical modeling, particularly when we aim to optimize parameter estimates based on observed data.

We will explore how the concept of likelihood is used in machine learning in the other vlog — link

If you have any questions about my explanation, feel free to reach out to me on LinkedIn

--

--

Manoj Kumar
0 Followers

GATE AIR 13, MTech in Data Science. Passionate about simplifying ML, AI, and statistics through intuitive Maths