Machine Learning Primal: Normal Distribution

Mihir Parulekar
Dec 3, 2018 · 5 min read

In data science and even in machine learning you will see one distribution so many times that dealing with that distribution will become so normal to you that you will call it “Normal Distribution”. (Apparently that what statisticians have done.) This is an introductory article and at the end of this article, you will have a broad idea of what normal distribution is and how to perform basic operations on it.

What is Normal distribution?

Following formula gives the pdf of the normal distribution:

Normal Distribution PDF
Shape of PDF & CDF

The mean of the normal distribution is μ (mu) and a standard deviation is σ sigma.

If you don’t know pdf, in simple terms pdf gives you the “relative likelihood of a continuous random variable taking that value”. In Normal distribution, it is like the bell-shaped curve.

CDF (Cumulative Distribution Function) is nothing but the integration of pdf. In the normal distribution, it is shown as Φ(z). Which is nothing but the probability that normally distributed random variable is less than value z.

CDF of a: Φ(a)

In general, mean μ gives you the central value (where normal distribution pdf is at peak ) while standard deviation (from here on I will refer it as std or σ) gives you the spread around mean. If the std is large, then our sample has large spread while if std is small then the sample is distributed very closely around the mean.

To have a clear idea of μ (mean) and a σ ( standard deviation) follow following diagram.

No standard deviation

If the standard deviation is zero all sizes of fish is exactly the same and that of μ. You can see the spread of distribution is nothing but a straight verticle line.

Std = 10px

When we increase standard deviation we can see that the size of fish varies around mean. Few of them have got bigger, while few of them have shrunk.

Now when we further increase the size of std you can notice the variance in fish size also increase. You can also notice the spread of the distribution is almost flat.

std = 20 px

Standard Normal:

What will be the value of a normal distribution with mean zero and std one,

Standard Normal(Just put mean= 0 and sigma=1)

Now as we have calculated z-scores that is nothing but the probability that a normally distributed random variable Z, with mean equal to 0 and variance equal to 1, is less than or equal to z.

These values are numerically computed and we can refer them using the z-score table.

Linear Transformation Property:

If X is a Normal random variable and if you did the linear transformation on X, the new random variable will also be distributed as a Normal distribution.

X is a normal distribution X ∼ Normal (μ, σ²) and Y = aX + b then, Y is also a normal distribution with parameters (aμ + b, a2 σ²).

Now generally in problems with standard distribution, we have to find out the probability of random variable less than specific value, or if you are dealing with interval problems such as confidence interval (during hypothesis testing), you have to find out are in normal distribution lying between two points, in such case integrating above pdf equation is challenging task.

Now using the above Linear Transformation Property, using the following formula:

For example, consider the following problem:

Consider wait period for a reply from a bank manager is an average of 200 hours with a standard deviation of 75 hours. How many of these wait periods can be expected to last for longer than 300 hours?

Using the above formula we will calculate z as:

z = (300–200)/75

z = 1.33

Φ(1.33) = 0.9082 = 90.82% (Calculated using z-score table.)

answer = 1–0.9082 = 9.12 %

One important thing about using z-scores is they are only provided for positive values. So to calculate z-score for the negative value you will use the following property.

1: Φ(a) + Φ(–a) = 1

See the following diagrams and you will understand:

Φ(a) + Φ(–a) = -1

Central Limit Theorem:

In very loose terms Central Limit Theorem says that the sum of a large number of sample is approximately normal. (There are few conditions on it.)

Infamous “All of statistics” Central Limit Theorem has stated as “For large sample size sample average has approximately normal distribution”

In more simple terms, when you add independent random variables then their properly normalized sum tends toward a normal distribution (informally a “bell curve”) even if the original variables themselves are not normally distributed.

This is the reason, normal distributions occur so many numbers of times. So many phenomena such as the height, weight, living age of individuals all resemble bell-shaped curves. If you give this concept a more thought you will understand the logic.

The normal distribution is very interesting and very vast topic. In this article, I tried to give very introductory and brief overview, I hope you enjoyed.

Thanks for reading! :) If you enjoyed this article, hit that 👏button below. ❤ Would really mean a lot to me and it helps other people see the story.