Probability Theory: Continuous Probability Distribution

9 min readAug 3, 2023

In our last post, we built a solid foundation in probability theory, random variables, and probability distribution functions.

In this blog, we will discuss commonly used Continuous Probability Distributions and in the upcoming blog, we will understand the Discrete Normal Distribution. Stay tuned!

Continuous Probability Distribution

Gaussian Normal distribution

Normal distribution is a density Distribution (often called probability distribution) when plotted on a graph forms a bell-shaped curve with most values clustering around a central region.

It is named after the German mathematician Carl Friedrich Gauss, who extensively studied its properties.

It is a parametric distribution that is defined by two parameters: the mean (μ) and the standard deviation (σ).

Normal Probability Density Function Formula:

Properties of normal distributions:

The mean, median, and mode are exactly the same but not necessarily 0. It can be any number.
The distribution is symmetric about the mean — half the values fall below the mean and half above the mean.
The empirical rule, also known as the 68–95–99.7 rule, applies to any normal distribution.
Most values lie within a central region.

Mean and standard deviation can be understood as location parameters and scale parameters respectively.

Normal distributions with different Means

The mean determines where the peak of the curve is centered. Increasing the mean moves the curve right while decreasing it moves the curve left.

Normal distributions with different Standard Deviation

The standard deviation stretches or squeezes the curve. A small standard deviation results in a narrow curve, while a large standard deviation leads to a wide curve.

Empirical rule

The empirical rule, also known as the 68–95–99.7 rule, applies to any normal distribution, regardless of its specific mean and SD values.

The empirical rule states that for any normal distribution:

Approximately 68% of the data falls within one SD of the mean.
Approximately 95% of the data falls within two SD of the mean.
Approximately 99.7% of the data falls within three SD of the mean.

The empirical rule is a quick way to get an overview of your data and check for any outliers.

Why does normal distribution is used in almost every statistic?

When a distribution is symmetric, the probability of an observation being on one side of the mean is the same as the probability of it being on the other side.
In a symmetric distribution, the mean, median, and mode are all equal. This property simplifies measures of central tendency, as there is a single value that represents the center of the distribution.
Estimating the mean and standard deviation is straightforward because of the symmetry around the mean.

Small Request:

Kindly follow Ashish Arora and give a clap to this if you find this content better after reading it! I dedicate an extensive effort to curating informative and well-researched materials just for you. Your support motivates me and helps me reach more people. 🙏

Standard Normal Distribution (Z- dist.)

The standard normal distribution, also called the z-distribution, is a special normal distribution where the mean is 0 and the standard deviation is 1.

Every normal distribution can be converted to the standard normal distribution by turning the individual values into z-scores and this process is known as Standardization.

The z-score tells us the number of standard deviations a data point is above or below the mean. A positive z-score indicates that the data point is above the mean, while a negative z-score indicates that the data point is below the mean.

Each Z-Score calculates probabilities associated with a given value in a normal distribution. This helps in determining the likelihood of observing a value or range of values within a distribution.

Given the z-score, we can then find the probability by looking at Z-Table.

A Z-table provides cumulative probabilities for the standard normal random variable being less than or equal to a given value, Z.

Let’s understand this with an example.

Find the probability of SAT scores in your sample exceeding 1380.

The mean of our distribution is 1150, and the standard deviation is 150. The z-score tells you how many standard deviations away 1380 are from the mean.

The cumulative probability of a Z-Score at 1.53 or the probability of SAT score being 1380 or less is 93.7%, But we needed a probability for SAT score of more than 1380. Hence, we will subtract it from 1.

Probability (X > 1380) = 1–0.937 = 0.063

That means it is likely that only 6.3% of SAT scores in your sample exceed 1380.

Student-t Distribution

Before understanding t-distribution, it is important for us to know how and why it came into existence or what problem does it solve?

In a standard normal distribution, we were dealing with population data itself or with a sample that was large enough to capture the population statistics.

However, the problem arises when we don’t have enough resources to work with a large sample, yet we want to find the population statistics with a limited sample or when the population standard deviation is unknown.

Hence in order to solve this, William Sealy Gosset, an English statistician, in 1908 derived the t-distribution by considering the distribution of the standardized mean for multiple small samples.

He found that:

The distribution followed a bell-shaped curve, similar to the normal distribution, but with thicker tails.
The shape of the distribution is depended upon the sample size (degree of freedom). Lesser the sample size flattens the curve will be.

In the context of the t-distribution, the degrees of freedom are related to the sample size which represents the number of independent pieces of information available in the sample.

The degrees of freedom in the t-distribution = n — 1.

Estimating the population standard deviation

Similar to Z-Score, He formulates a T-Score where he replaces the population standard deviation with the sample standard deviation.
Along with that, he also takes into account the sample size by dividing the sample standard deviation by sqrt of sample size because he realizes that the sample standard deviation tends to vary from one sample to another due to the inherent variability in the data.

Why divide by sample standard deviation by sqrt(n) and not n?

The reason for dividing it by sqrt of n instead of dividing by n directly is that as the sample size increases, the sample standard deviation tends to approach the population standard deviation.
So, if we directly divide sample standard deviation by n then it would underestimate the variability in smaller samples and lead to inaccurate results and when the sample size increases, and we divide it by n then we overestimate the sample variability which also leads to inaccurate results.

Hence dividing the sample standard deviation by the square root of n in the t-distribution helps to adjust for the uncertainty associated with smaller sample sizes, providing more accurate and reliable estimates of population parameters.

Let’s now understand the T-distribution.

T-distribution Definition:

The T-distribution is a continuous probability distribution that allows for more variability in the data and provides more accurate results when working with smaller sample sizes.

Like the standard normal distribution, it is also symmetric around the mean=0, however, the standard deviation is not 1.

It may get close to one when the sample will be increased. Until then, it will have a flatter shape compared to the z-distribution and will also have thicker tails.

Similar to Z-distribution, T-distribution has also provided us with the probability table for multiples degree of freedom. So, by calculating the t-score and using the degree of freedom (or, sample size-1), we can find the probability.

Uniform Distribution

A uniform distribution is a probability distribution that describes a situation where all possible outcomes have an equal chance of occurring in a particular range.

The uniform distribution is represented as a rectangular shape. The constant height signifies that each value within the range has an equal probability of being observed.

The uniform distribution has two main parameters: the minimum value (a) and the maximum value (b). These parameters define the range over which the uniform distribution is defined.

Here, any value between a and b (inclusive) has an equal probability of occurrence, while values outside this range have a probability of 0.

Let’s illustrate this with an example because, as a student, I know that we have a stereotype that the uniform distribution is used with discrete random variables only as they are countable.

Example: Consider the time it takes for a traffic signal to turn green after it has been red. Assume that the time the signal takes to turn green is uniformly distributed between 20 seconds and 40 seconds.

The minimum value (a) = 20 seconds.
The maximum value (b) = 40 seconds.

The continuous uniform distribution function:

f(x) = 1 / (40–20) = 1 / 20

f(x), for 20 ≤ x ≤ 40 = 1/20

otherwise, f(x) = 0

This means that at any time between 20 seconds and 40 seconds, the probability of the signal turning green has an equal probability of 1/20.

The probability of the signal turning green in exactly 25 seconds = 1/20.

Similarly, the probability of the signal turning green in exactly 30 seconds = 1/20.

And so on, for any value within the range of 20 to 40 seconds.

Log-Normal Distribution

The probability density function (PDF) of the log-normal distribution is a function that describes the likelihood of a random variable following a log-normal distribution.

The log-normal distribution is commonly used to model data that is skewed to the right and often occurs in various fields such as finance, economics, and biology.

The PDF of the log-normal distribution is given by:

Unlike the standard normal distribution (Z-distribution), which has a standard Z-table for finding cumulative probabilities, the log-normal distribution does not have a standard table available for direct lookup.

Calculating cumulative probabilities for the log-normal distribution generally requires the use of numerical methods.

— — — — — — — — — — — — —

So, this is all from this post, in the next post we will discuss Discrete Probability Distribution. Chi-Square Distribution will be discussed along with the Chi-Square Test of Independence.

Happy learning!

Feel free to find me on LinkedIn, Github.