Maths for ML — Probability Distributions

Raghunath D

9 min readMar 24, 2019

In this post, we will learn about random variables, probability distributions.

Check this blog: http://blogs.ubc.ca/math105/

Random Variables

A Random Variable is a set of possible values from a random experiment.

Probability Distribution

A mathematical description of Probabilities for a random variable.

Probability Mass Function (PMF):

In probability and statistics, a probability mass function(PMF) is a function that gives the probability that a discrete random variable is exactly equal to some value.

Mean of a Discrete random variable:

The mean of any discrete random variable is an average of the possible outcomes, with each outcome weighted by its probability.

Variance of a Discrete random variable:

Example 1:

Note: The mean does not need to a possible value of X or an integer.

Variance and Standard Deviation for Discrete Probability Distribution

Probability Density Function (PDF):

A continuous random variable takes on an uncountably infinite number of possible values. For a discrete random variable X that takes on a finite or countably infinite number of possible values, we determined P(X = x) for all of the possible values of X, and called it the probability mass function (“p.m.f.”).

→ For continuous random variables, the probability that X takes on any particular value x is 0. That is, finding P(X = x) for a continuous random variable X is not going to work. Instead, we’ll need to find the probability that X falls in some interval (a, b), that is, we’ll need to find P(a < X < b). We’ll do that using a probability density function (“PDF”).

Bernoulli Trials

Binomial Distribution

Example:

Example 2:

Example 3:

Example 4:

Poisson Distribution

You can approximate Binomial distribution using Poisson distribution.

You can use Poisson distribution for kind of events happening over certain interval of time/time window.

Example 1:

Example 2:

Example 3:

Now, we can use Poisson distribution to approximate Binomial distribution.

For a continuous random variable X, the analogue of a histogram is a continuous curve (the probability density function) and it is our primary tool in finding probabilities related to the variable.

As with the histogram for a random variable with a finite number of values, the total area under the curve equals 1.

Normal Probability Distribution

— this is calculated for continuous random variable.

The normal distribution refers to a family of continuous probability distributions described by the normal equation. Probabilities correspond to areas under the curve and are calculated over intervals rather than for specific values of the random variable.

The normal distribution is the most important probability distribution in statistics because it fits many natural phenomena.

For example, heights, blood pressure, measurement error, and IQ scores follow the normal distribution. It is also known as the Gaussian distribution and the bell curve.

The normal distribution is a probability function that describes how the values of a variable are distributed. It is a symmetric distribution where most of the observations cluster around the central peak and the probabilities for values further away from the mean taper off equally in both directions. Extreme values in both tails of the distribution are similarly unlikely.

Recap of Probability Distributions — Discrete Probability Distribution

The graph of the normal distribution depends on two factors — the mean and the standard deviation.

The mean of the distribution determines the location of the center of the graph, and
the standard deviation determines the height and width of the graph.

All normal distributions look like a symmetric, bell-shaped curve, as shown below.

Smaller standard deviation

Bigger standard deviation

→ Integration (Integral calculus) is used to calculate the area under the curve.

Properties of a Normal Distribution Curve :

All Normal Curves have the same general bell shape.
The curve is symmetric with respect to a vertical line that passes through the peak of the curve.
The curve is centered at the mean µ which coincides with the median and the mode and is located at the point beneath the peak of the curve.
The area under the curve is always 1.
The curve is completely determined by the mean µ and the standard deviation σ. For the same mean, µ, a smaller value of σ gives a taller and narrower curve, whereas a larger value of σ gives a flatter curve.
The area under the curve to the right of the mean is 0.5 and the area under the curve to the left of the mean is 0.5.
The empirical rule (68%, 95%, 99.7%) for mound shaped data applies to variables with normal distributions. For example, approximately 95% of the measurements will fall within 2 standard deviations of the mean, i.e. within the interval (µ − 2σ, µ + 2σ).

Example 1:

→ We need to calculate area under 50 in.

Standard Normal Distribution

The standard Normal curve is the normal curve with mean µ = 0 and standard deviation σ = 1.

How to do Normal Distributions Calculations

A guide to how to do calculations involving the standard normal distribution. The calculations show the area under the…

statistics.laerd.com

If X is a normal random variable with mean µ and standard deviation σ, then the random variable Z defined by :

has a standard normal distribution.

The value of Z gives the number of standard deviations between X and the mean µ (negative values are values below the mean, positive values are values above the mean).

Empirical rule for Normal Standard distribution:

If data has a normal distribution with µ = 0, σ = 1, we have the following empirical rule:

Approximately 68% of the measurements will fall within 1 standard deviation of the mean or equivalently in the interval (−1, 1).
Approximately 95% of the measurements will fall within 2 standard deviations of the mean or equivalently in the interval (−2, 2).
Approximately 99.7% of the measurements (essentially all) will fall within 3 standard deviations of the mean, or equivalently in the interval (−3, 3).

We do not have a table for every normal random variable (there are infinitely many of them!). So we will convert problems about general normal random to problems about the standard normal random variable, by standardizing — converting all relevant values of the general normal random variable to z-scores, and then calculating probabilities of these z-scores from a standard normal table (or using a calculator).

The standard normal distribution table(Z-score table) provides the probability that a normally distributed random variable Z, with mean equal to 0 and variance equal to 1, is less than or equal to z.

https://www3.nd.edu/~dgalvin1/10120/10120_S16/Topic20_8p7_Galvin.pdf