9 Common Probability Distributions with Mean & Variance derivations

Atul Sharma
Analytics Vidhya
Published in
6 min readNov 9, 2020

I have been searching for the derivations (like many of you do too) of the well crammed formulae of Mean & Variance related to various probability distributions. Although, existence of such derivations is there already but in a scattered manner, so I decided to go through each one of them in detail and congregate my interpretation tagging them with much needed derivations of each. The quest for an end to end discussion of random variable probability distributions along with derivations of Mean and Variance for each has come to an end as you read this.

Finally in this blog, you will receive structured details about Random variable types and their different types of probability distributions with relevant examples. Every probability distribution discussed has a following pattern:

1. Features about the distribution

2. Mathematical Equation

3. Visual Representation(Plot) with example

4. Mean & Variance derivation to reach well crammed formulae

Let’s begin!!!

We will discuss probability distributions with major dissection on the basis of two data types:

1. Discrete (Random Variable)

2. Continuous (Random Variable)

(Image by author)
  • CMF is Cumulative Mass Function and CDF is Cumulative Density Function
  • 0 ≤ p(x) ≤ 1 for all x

Discrete random variable:

We can either plot the probability on count (sum) basis or on proportion basis. E.g.

Count (Sum) basis:

After tossing a coin 10 times, we can plot the probability of count of heads (0 heads, 1 heads, 2 heads…………10 heads)

(Image by author)

Proportion basis:

After tossing a coin 10 times, we can plot the probability of proportion of heads

(0/10 = 0, 1/10 = 0.1, 2/10 = 0.2 …………10/10 = 1)

(Image by author)

Continuous random variable:

We can plot the probability distribution by joining the tip of significantly large number of frequency bins (approaching ∞ which are actually real number values of continuous random variable) using a smooth running curve. The value of the curve defining function f(x) actually depicts the height alone at a particular point, for probability calculation we need to compute area under the curve.

E.g. Real Number frequency basis:

Plotting the height variable of population of males. Assumed range for this example is [150cm, 200 cm] containing all real values i.e. 155.58 cm, 176.2 cm etc.

(Image by author)

To calculate the probability of an outcome (real value), we use the concept of integration:

Probability of a value lying between x1 and x2:

I hope by now you have a clarity regarding probability representations of both discrete and continuous random variable. Let’s move to the types of probability distributions:

1. Discrete Probability Distributions

Bernoulli distribution, Binomial distribution, Geometric distribution, Negative Binomial distribution, Hypergeometric distribution, Poisson distribution

2. Continuous Probability Distributions

Uniform distribution, Normal (Gaussian) distribution, Exponential distribution

Discrete Probability Distributions

Bernoulli distribution:

Features –

1. A single event/trial

2. Two possible outcomes — Success or Failure (Mutually Exclusive and Exhaustive)

3. P(Success) = p

4. P(Failure) = 1- p

If all above features hold for X random variable, then it has a Bernoulli distribution.

(Image by author)

Binomial distribution:

Features –

1. n independent events/trials

2. Two possible outcomes — Success or Failure (Mutually Exclusive and Exhaustive)

3. P(Success) = p

4. P(Failure) = 1- p

5. X represents the number of successes in n trials/events

If all above features hold for X random variable, then it has a Binomial distribution.

(Image by author)

Geometric distribution:

Features –

1. n independent events/trials

2. Two possible outcomes — Success or Failure (Mutually Exclusive and Exhaustive)

3. P(Success) = p

4. P(Failure) = 1- p

5. X represents the number of trials needed to get the first success

So for the first success to occur at xth trial:

- First (x-1) trials must be failures

- xth trial must be a success

P(X = x) = (1-p) x-1.p for x = 1, 2, 3,………..

Σ p(x)= 1 for all x.

(Image by author)

Negative Binomial distribution:

Features –

1. n independent events/trials

2. Two possible outcomes — Success or Failure (Mutually Exclusive and Exhaustive)

3. P(Success) = p

4. P(Failure) = 1- p

5. X represents the number of trials needed to get the rth success

So for the rth success to occur at xth trial:

  • First (x-1) trials must have (r-1) success
  • xth trial must be a success

Hypergeometric distribution:

Features –

1. Events are dependent (random sampling without replacement)

2. Randomly sampling n objects without replacement from a population that contains ‘a’ successes and ‘N-a’ failures

3. X represents the number of successes in a sample

4. Binomial distribution provides a reasonable approximation to the hypergeometric when sampling is done for not more than 5% of the population

(Image by author)

Poisson distribution:

Features –

1. Counting the number of occurrences of an event in a given unit of time, distance, area, or volume

2. Events occur independently and probability of occurrence in a given length of time does not change through time

3. X represents the number of events in a fixed unit of time

4. Binomial distribution tends toward the Poisson distribution as n ->∞

, p -> 0 and np stays constant

(Image by author)

Continuous Probability Distributions

Uniform distribution:

Features –

1. Real number output(continuous) with equal probability of occurrence

2. Probability determination using integration (area under the curve)

(Image by author)

Normal (Gaussian) distribution:

Features –

1. Real number output(continuous) with unequal probability (bell-shaped) of occurrence under the influence of chance causes

2. Probability determination using integration (area under the curve)

(Image by author)

Exponential distribution:

Features –

1. Inverse of Poisson distribution

2. Measures time per single event (time between events in a poisson process)

3. Events occur independently and at a constant rate

4. X represents the time it will take for the successive event to occur

I hope this blog was useful for you and helped you in exploring the genesis of the notorious formulae of probability distributions. With this we end the blog here, I will be posting more in the future…..

Thanks!!!

--

--

Atul Sharma
Analytics Vidhya

Partially Frequentist, Partially Bayesian, Fully Futuristic.