Important Distributions in Probability & Statistics

Published in

Analytics Vidhya

7 min readAug 4, 2020

Random Variables follow different types of distribution in probability space which decides their behaviour and helps in predictions.

Table of contents:

Introduction
Gaussian/Normal Distribution
Binomial Distribution
Bernoulli Distribution
Log Normal Distribution
Power Law Distribution
Uses of Distributions

Introduction

Whenever we come across any experiment in probability, we talk about random variable which is nothing but the variable which takes the expected outcomes of that experiment. For example, when we roll a dice, we expect a value from the set {1,2,3,4,5,6}. So we define a random variable X which takes these values every time we roll.

Depending upon the experiment, the random variable can take either discrete values or continuous values. So this dice example is of discrete random variable as it takes a discrete value. But suppose we are talking about the price of houses of a particular town then the associated random variable can take continuous values (e.g. $550,000, $1,200,523.54, etc).

When we plot these expected values of random variable vs. the frequency of there appearance in an experiment, we get a frequency distribution plot in form of histograms. After using kernel Density Estimation for smoothing these histograms, we get a fine curve. This curve is referred as “Distribution”.

The orange smoothed curve is the probability distribution

Gaussian/Normal Distribution

Gaussian/Normal distribution is a continuous probability distribution function where random variable lies symmetrically around a mean (μ) and Variance (σ²).

general expression for Gaussian distribution curve

Mean (μ): It decides the position of the peak on X-axis. Also, all the data are symmetrically located on either side of the the line X = μ. As you can observe in the image shown, the Blue, Red and Yellow curves are spread either side of X=0 but Green curve is having its center at X= -2. So by looking these curves, we can easily say that mean of Blue, Red and Yellow is 0 whereas that of Green is -2.

Variance (σ²): It decides the spread and height of the curve. Variance is nothing but the square of the standard deviation. Notice here in the image, σ² values for all the four curves are given. Now without looking at the values, we can easily say that the yellow curve has the lowest height and maximum spread and spread can be intuitively understood as standard deviation. So we can say that Yellow curve has maximum variance out of the four. Similarly Blue curve has minimum.

If we put μ = 0 and σ = 1, the Normal distribution is then called Standard Normal Distribution or Standard Normal Variate and the general expression changes to:

Now one can imagine, what does the denominator signify? Its’s there to ensure that the area under curve for Normal distribution is always equal to 1.

We get a lot of useful information about segmentation of data from Normal Distribution. Look at the image:

Values segmentation diagram for Normal Distribution

As you can see, this distribution stores 34.1% of total mass if we move one standard deviation right from mean, (34.1 + 13.6) = 47.7% of mass if we move 2 standard deviations right from mean and 49.8% when 3 standard deviation right. Since this curve is symmetrical, it holds for either sides.

So, now we know if any property follows a Normal distribution, e.g. weights of population in a town, we can easily estimate a lot of values without actually performing extensive analysis. This is the power of Normal Distribution.

Binomial Distribution

As we can see in the name, there is a “Bi”. So, this ‘Bi’ stands for 2 outcomes of an experiment, either Yes or No, either Pass or Fail, either 1 or 0 etc. In most simple terms this distribution is the distribution of multiple repeated experiments and their probabilities where the expected outcome is either “Success” or “Failure”.

As you can observe from image, it is a discrete probability distribution function. Main parameters are n (number of trials) and p (probability of success).

Now suppose we have a probability p of SUCCESS of an event, then the probability of FAILURE is (1-p) and let us say you repeat the experiment n times (number of trials = n). Then probability of getting k successes in n independent Bernoulli trials is:

Probability Mass Function of Binomial Distribution

where k belongs in range [0,n] and:

Note: We will see what is Bernoulli trial in next section.

Let me ask a simple question. Suppose there is cricket match going on between India and Australia. Rohit Sharma has already scored 151* and by your experience you know that after 150 Rohit has a probability 0.3 of hitting a six. It’s the last over and your father asks you what are the chances that Rohit will hit 4 sixes. Then how would you find out?

This is a typical example of Binomial trials. So, the solution is:

Note: The 6 and 4 in big bracket is nothing but 6C4 which is combinations of 4 sixes in 6 balls.

Bernoulli Distribution:

In Binomial Distribution, we have a special case knows as Bernoulli Distribution where n=1 which means just a single trial is conducted in that binomial experiment. When we put n=1 in PMF (Probability Mass Function) of Binomial, the nCk will be equal to 1 and function becomes:

PMF of Bernoulli Distribution

where k = {0,1}.

Now let’s take the India vs Australia match. Let’s say when Rohit hits a ton then chances of India winning is 0.7. So you can simply tell your father that there is a 70% chance that India will win.It was nothing but a very basic Bernoulli trial.

Log Normal Distribution

We have seen the nature of Normal distribution and in first glance many would say that Log normal curve also somewhat gives a glimpse of Normal distribution which is right skewed.

Suppose there is a random variable X which follows Log Normal distribution with mean = μ and Variance = σ². X has a total n possible values (x1,x2,x3…..xn). Now take natural Log over all X values and create a new random variable Y = [log(x1),log(x2),log(x3)……log(xn)]. This random variable Y will be Normally distributed.

In other words if there is a Normal Distribution Y, and we take it’s exponential function X = exp(Y) then X will follow Log Normal distribution. In simple language as name suggests Log Normal distribution is the distribution of a random variable whose natural log is Normally distributed.

It has also the same parameters as Gaussian: mean (μ) and Variance (σ²).

Power Law/Pareto Distribution

Power Law is a relationship between two quantities in which changes in one quantity will proportionally change the other quantity. It follows a 80–20 rule which says: in top 20% of values, we will find roughly 80% of mass density. As you can see in the image, the slightly darker left portion is 80% of mass and the right bright yellow is 20%.

When a probability distribution follows a power law we say it is a Pareto Distribution.

Pareto distribution is controlled by two parameters: x_m and α.

x_m can be thought of as mean which controls scale of curve and α can be thought of as σ which controls the shape of curve. (Note: x_m is not mean and α is not σ. I am speaking intuitively for understanding.)

Now as we can see in the image, all four curves have their peak located at x=1. So, we can say that x_m = 1 for all the curves.

As we can observe from the image, as α increases the peak also goes up and and in extreme case of α tending to infinity, the curve transforms into merely a vertical line. This is called a Dirac Delta Function.

As α reduces, the flatness of curve increases.

Uses of Distributions

If we know a particular property follows a certain dist then we can take a sample and find the parameters involved and then can plot the Probability Distribution function to answer lot of question.

For ex: In a town of 100,000 people, we have to do height analysis, but we cannot do a survey for such a large population. So, we select a random sample and find it sample mean and sample standard deviation.

Now suppose a doctor or expert tells us height follows a Normal distribution. Then we can easily answer many questions.

References:

A Gentle Introduction to Probability Density Estimation - Machine Learning Mastery

Probability density is the relationship between observations and their probability. Some outcomes of a random variable…

machinelearningmastery.com

File:Normal Distribution PDF.svg

From Wikipedia, the free encyclopedia Click on a date/time to view the file as it appeared at that time. Date/Time…

en.wikipedia.org

Binomial distribution

In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability…

en.wikipedia.org

Log-normal distribution

In probability theory, a log-normal (or lognormal) distribution is a continuous probability distribution of a random…

en.wikipedia.org

Pareto distribution

The Pareto distribution, named after the Italian civil engineer, economist, and sociologist Vilfredo Pareto,[1] is a…

en.wikipedia.org

Important Distributions in Probability & Statistics

Introduction

Gaussian/Normal Distribution

Binomial Distribution

Bernoulli Distribution:

Log Normal Distribution

Power Law/Pareto Distribution

Uses of Distributions

References:

A Gentle Introduction to Probability Density Estimation - Machine Learning Mastery

Probability density is the relationship between observations and their probability. Some outcomes of a random variable…

File:Normal Distribution PDF.svg

From Wikipedia, the free encyclopedia Click on a date/time to view the file as it appeared at that time. Date/Time…

Binomial distribution

In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability…

Log-normal distribution

In probability theory, a log-normal (or lognormal) distribution is a continuous probability distribution of a random…

Pareto distribution

The Pareto distribution, named after the Italian civil engineer, economist, and sociologist Vilfredo Pareto,[1] is a…

Written by Saurabh Raj