Probability Distribution & PDF, CDF, PMF

Ishika Mittal
The Business Club, IIT (BHU) Varanasi
4 min readJun 7, 2020

Before we launch into these heavy terms, let me tell you what a Random Variable is, and then- Why are we here.

As Investopedia says- ‘A random variable is a variable whose value is unknown or a function that assigns values to each of an experiment’s outcomes. Random variables are often designated by letters and can be classified as discrete, which are variables that have specific values, or continuous, which are variables that can have any values within a continuous range.’

In highlights, remember two things:

  1. A random variable is nothing but an outcome of any event, which was not biased / was unknown.
  2. They can be of two types- Discrete & Continuous.

Example- getting a six on rolling dice.

Now, let us talk about how it is related to Probability Distributions.

Probability Distribution

Probability Distribution is a statistical function which is a collection of all the possible random variables of any random Event (E), with its corresponding probability of occurrence (P(E)).

For example, if we take the classic case of tossing a fair coin- the random variable is X and the probability distribution of X= 0.5 for X = heads, and 0.5 for X = tails.

Gaussian (Normal) Distribution is one of the most widely worshipped distribution owing to the fact that most of the natural processes (Ex- the height of all the 18 y/o in India) follow a Normal Distribution. This means that when graphed, they depict a bell curve, symmetric about the mean. Their mean, median, and mode are also equal.

https://towardsdatascience.com/understanding-the-68-95-99-7-rule-for-a-normal-distribution-b7b7cbf760c2

This is in accordance with the Central Limit Theorem, which states that- the distribution of sample means approximates a normal distribution, as the sample size becomes larger, assuming that all samples are identical in size, and regardless of the population distribution shape.

Probability Mass Function (PMF) :

PMF is a function that gives the probability that a discrete random variable (a random variable that can take only a finite set of values) is exactly equal to some value.

Here, f is PMF returning the Probability(P) that any random variable(X) has a value x. For example- We roll a fair die, then the PMF of X will be:

Probability Density Function (PDF) :

Imagine a scenario where the random variables are continuous, it is not possible to count and sum them to plot a probability distribution. That is why these random variables are integrated to derive their probability distribution called PDF.

Ex- PDFs are used to analyze the risk of a particular security, as an individual stock. If you plot them, a bell-shaped curve will be formed with the mean/peak depicting neutral market risk, and either side being risk/reward.

Cumulative Density Function (CDF) :

CDF is a statistical function that gives us the probability that a random variable is less than a certain value. All random variables, discrete and continuous have a cumulative distribution function (CDF).

  1. In the case of a discrete random variable,

Where f(x) is the PMF of x.

  1. In case of a continuous random variable,

Where f(x) is the PDF of x.

When it comes to Data Analysis, Probability Distribution of the dataset, Probability Mass Function, Cumulative Density Function, and Probability Density Function come in the most handy for an intuitive understanding of the dataset.

For further reading, head over to Investopedia and Towards Data Science for applications of the same in Python/R.

--

--

Ishika Mittal
The Business Club, IIT (BHU) Varanasi

Early-stage SaaS Investor || Enjoy running, traveling, history, photography & architecture.