Inferential Statistics 101 -part 2

Shweta Doshi
GreyAtom
Published in
7 min readApr 2, 2018

Normal Distribution

Joey: Hi Chandler, Good morning. How you doing?

Chandler: Hi Joey, Good morning. I am fine. How about you?

Joey: I am fine buddy. Yesterday you were telling me about the normal distribution right? What is so special about that?

Chandler: I think now you are clear what a random variable is and how different it is from a regular variable (Please revise the sampling distribution of sample mean blog, if you are not sure about the concept of random variables). There are two types of random variables.

  • Discrete random variable: A variable that can take only countable number of values, for example, tossing a coin where there are only two possible outcomes, head and tail,or throwing a die where the outcome could be any number between 1and 6.
  • Continuous random variable: A variable that can take infinitely many values in a given interval of two specified values, for example, height of a man, which could take any value between 140 cms to 220 cms (ideally!) or the amount of rainfall in a city which generally could take any value between 0 cm to 60 cms

Normal distribution is a common probability distribution for representing continuous random variables. It is also called Gaussian distribution. The probability distribution function (PDF) of a normal distribution is given by the following expression:

It is a function of two parameters which are mean (μ) and variance(σ2). Any question related to a normal distribution can be answered if we know its mean and variance. These parameters alter the shape and the position of the distribution. Figure A shows the normal distributions of different means and variances.

Figure A

If we are dealing with discrete probability distribution functions, then we can just use the summation function to calculate the required probabilities. For example, the probability of getting a number less than 3 on a roll of die can be obtained by just summing the probability of obtaining 1 and the probability of obtaining 2 which is 13(16.+ 16). But for a continuous probability distribution function, we use the integration function to calculate the required probabilities. Here, the probability value is given by the area under the curve. The total area under the curve is 1. Let take an example to see how this probability computation is done.

Example: Assume that the weight of Spanish water dog is normally distributed with mean 3500 g and standard deviation 500 g. Now to answer the question “what is the probability that a randomly selected Spanish water dog weighs less than 3100 g?” we follow the procedure given below:

The required probability for this problem is given by the shaded region in the below figure.

One way to calculate the area of the shaded region is through integrating the PDF of the normal distribution in the defined interval, which is –∞ to 3100 in this case.

Some of the interesting properties of the normal distribution are as follows:

  • The mean, median and mode of a normal distribution are always equal.
  • It is a bell shaped curve (sometimes referred as bell curve also!) and symmetric about the mean (μ), so exactly fifty percent of the data lies to the left of the mean and the remaining fifty percent to the right.
  • The total area under the curve is 1.
  • It is unimodal in nature i.e. there is only one mode.
  • The range of the normal distribution is from –∞ to +∞ (Unbounded).
  • If 100 data points are sampled from a normal distribution, then approximately 68 data points falls within the range of μ±σ, 95 data points falls within the range of μ±2σ and 99 data points falls within the range of μ±3σ as shown in the below figure.

Joey: Okay, I understand what a normal distribution is and how to use it. Are there any specific reasons as to why statisticians use the word normal distribution very often in a conversation?

Chandler: Yes Joey. There are many reasons which makes the normal distribution so special. Let me tell you the most important reasons:

  • It is very important because of the central limit theorem (Please revisit the central limit theorem blog to know more about it). It says that if we aggregate sufficiently large number of random variables, we eventually end up getting a normal distribution irrespective of the nature of the original random variables.
  • Many statistical tools are build with the assumption of normality. This assumption ensures tractability and helps us in getting a closed form equation in many cases.
  • Linear combination of random variables again results in a random variable which is normally distributed, for example, if X1 and X2 are two independent normal random variables and Z = X1 + X2, then the random variable Z also will follow a normal distribution.
  • The entire distribution can be described by only two parameters μ and σ2.
  • It makes the math much simpler. Taking the logarithm of the pdf and differentiating it results in a set of linear equations which are easy to solve analytically.
  • There are many distributions which can be approximated to a normal distribution under certain conditions.
  • The binomial distribution is approximately normal for large n and for p not close to 0.5, where n and p are the parameters of binomial distribution.
  • The Poisson distribution is approximately normal for large values of λ, where λ is the rate parameter.
  • The chi-square distribution is approximately normal for large k, where k is the degrees of freedom.
  • The student t-distribution is approximately normal for large υ, where υ is the degrees of freedom.

These are some of the reasons why the normal distribution holds a special place in statistics.

Joey: Wow, that’s amazing Chandler. Now I can really appreciate the power of central limit theorem and the normal distribution in statistics. But I have heard people using the term standard normal distribution instead of normal distribution, are they same?

Chandler: Standard normal distribution is a normal distribution with mean 0 and variance 1. Let X be a random variable which follows normal distribution with mean μ and variance σ2 which is denoted by

Then, the standard normal distribution Z is given by

A normal distribution X, can be transformed into a standard normal distribution Z using the following equation

From the above equation we can infer that the standard normal distribution can be seen as a scaled and normalized version of a normal distribution.

Standard normal tables are generally used for computing the required probabilities when people deal with normal distributions.

Joey: Errr… can you elaborate about it?

Chandler: Sure, let’s take the same problem which is discussed above. In order to find out the required probability we need to find out the area to the left of the curve as discussed. It can be done by using a standard normal table as outlined below:

First we find out the Z value using the expression

From the above equation we can infer that the standard normal distribution can be seen as a scaled and normalized version of a normal distribution.

Standard normal tables are generally used for computing the required probabilities when people deal with normal distributions.

Joey: Errr… can you elaborate about it?

Chandler: Sure, let’s take the same problem which is discussed above. In order to find out the required probability we need to find out the area to the left of the curve as discussed. It can be done by using a standard normal table as outlined below:

First we find out the Z value using the expression

Where X=3100, μ= 3500 and σ = 500, which results in a Z value of -0.800. Once we have the Z value we can refer the standard normal table for the probability. Since the value starts with -0.8 we go to the row to -0.8, and the next digits are 00 so we go to the column corresponding to .00 and the intersection of this result in 0.21186, which is our required probability (Please refer to the table below.

Joey: Great. But is there any other way to compute this probability without using the table? Since, it would become really cumbersome and annoying if we keep on referring to the table every time.

Chandler: Yes, that’s a very valid point. We can get it using any programming language like R or Python.

Given below is a snippet of the code for computing this probability using python

import scipy.stats

A=scipy.stats.norm(3500, 500)

A.cdf(3100)

Joey: Awesome Chandler. I am really getting a hold on this subject now and let’s discuss more about this later since I have to take my son for an audition today.

The author of this blog is Balaji P who is pursuing PhD in reinforcement learning at IIT Madras

Quora- www.quora.com/profile/Balaji-Pitchai-Kannu

--

--

Shweta Doshi
GreyAtom

I am an unapologetic idealist who believes that to gain quality education,we need to transform the way we teach & learn.I am the Co-Founder at www.greyatom.com