An Introduction to Normal Distribution using Python

Sonik Mishra
7 min readDec 6, 2018

--

Statistics as defined by the American Statistical Association (ASA) “is the science of learning from data, and of measuring, controlling and communicating uncertainty.

Statistics as a science is not something recent.

You can skip the following section, if you are already familiar with the history of statistics or you find history less exciting.

In fact, statistics is often dated back to 1662, when John Graunt, along with William Petty, developed early human statistical and census methods. The mathematical foundations heavily drew on probability theory discovered in 16th century by Gerolamo Cardano, Pierre de Fermat and Blaise Pascal. Jakob Bernoulli’s Ars Conjectandi (posthumous, 1713) and Abraham de Moivre’s, The Doctrine of Chances (1718) treated statistics as a separate branch of mathematics. In his book, Bernoulli introduced the idea of representing complete certainty as one and probability as a number between zero and one. Later, Oxford scholar Francis Ysidro Edgeworth’s, first paper on statistics (1883) explored the law of error (later known as the Gaussian/Normal Distribution), and Methods of Statistics (1885) introduced an early version of t-distribution. Francis Galton is credited as one of the principal founders of statistical theory. His contributions included concepts of descriptive statistics i.e standard deviation, correlation, regression and applications to the study of variety of human characteristics like height, weight, eyelash length among many. He found that many of these could be fitted to a normal curve. Galton published a paper in Nature (1907) on the usefulness of the median. This was quite funnily concerned with the accuracy of 787 guesses of weight of an ox, at a country fair. The actual weight was 1208 pounds, the median guess was 1198! Those guesses were again markedly non-normally distributed .Now, when you see a rake of oxen, don’t just think how mean, go median!

Most of the statistical methods came to recent prominence in data sciences including artificial intelligence, due to significant enhancements in computing power. The mathematical methods have had existed since early 1900s as you read earlier, but statistical application to real-time industrial datasets, required much more computing power than mere 10000 TFLOPS. The infographic from Experts Exchange, draws a much better analogy!

PS: You can download the files used in the blog from here. For basic hands-on of python, you can refer to a previous post.

Coming back to data, it does raise many questions like, “What is the expected value or a range of expected values?” or “How does the data look like ?”. To understand the data, it needs primary transformation to reveal underlying information, which we can then use. The calculations made to summarize raw data into mean, median, mode and measures of spread are called descriptive statistics, which many of you would be familiar with. I have linked a medium post, in case you wish to revise the basic concepts.

However, in most of the cases, we have to work with finite underlying data, in several cases with missing data points. For example, it is impossible to measure the height of every human being on earth, to derive an exact metric of say mean height by gender of this planet. We need 7.7 bn data points too! Here comes inferential statistics to our aid, which is but a fancy name to quantify properties of population, based on a small subset of observed data called sample. It uses the concepts developed way back in the 1700s by Bernoulli and friends and the computing power of today and plus we have the Tianhe-2.

While there is a plethora of theorems in inferential statistics, we will cover the widely applied normal distribution.

Introduction to ‘the’ Normal Distribution

Normal distribution is perhaps the most frequently used statistical term after the mean. Abraham de Moivre, the 18th century mathematician was also a consultant to gamblers (yes you read it right!) and was often called upon to make frequency distribution curves for coin flips and dice rolls. He was quite intrigued by the shape of frequency distribution curves for coin flips, which approached a smooth curve as the number of flips increased. All he needed to come up with, was a mathematical expression for the curve (which he did). It is now known as the bell curve or the Gaussian distribution or the normal curve. The bell curve is applied to almost everything these days, including your HR performance appraisal to molecular physics. Other mortal examples are heights of people, size of manufactured goods, astronomical measurements and examination scores among many others.

The probability density function for a normal distribution with mean (or expected value) of 𝛍 and standard deviation of 𝛔 is given by:

𝑓(𝑥|𝜇,σ) = 𝓮𝑥𝑝[-(𝑥-𝜇)²/2𝜎²]÷√(2𝜋σ²)

Let’s try to graph this normal distribution function in python and import a few libraries that we shall need need in later posts in this series.

Graph of Normal Distribution

The output maps the curve as shown below:

Properties of normal distribution

  • Mean = Median = Mode
  • Curve is symmetric about the center center
  • i.e 50% of the values are less than the mean(𝜇) and 50% of values are greater than the mean(𝜇)
  • The total area under the curve is 1
  • 68% of values are within 1 standard deviation of the mean, 𝜇±𝜎
  • 95% of values are within 2 standard deviations of the mean, 𝜇±2𝜎
  • 99.7% of values are within 3 standard deviations of the mean, 𝜇±3𝜎

To find the probability of a value occuring within a range in normal distribution, we can integrate the function, which is nothing but area under the curve. Let’s test it out by integrating the area under the curve (using the quad function in python) for ±2𝜎, on either side of the mean (𝜇). Here we have taken the mean, 𝜇 = 0 and 𝜎 = 1, which is better known as the standard normal distribution.

Integration: [-2, 2] ⨛𝑓(𝑥).𝒹𝓍

where, 𝑓(𝑥) = 𝓮𝑥𝑝[-(𝑥-𝜇)²/2𝜎²]÷√(2𝜋σ²); 𝜇=0, σ=1

Integration of Standard Normal Distribution Function: [-2, 2] ⨛𝑓(𝑥).𝒹𝓍

The output is 0.9544997361036417 ~ 95.4% which says that approximately 95% of the values lie between 2 standard deviations of the mean. You can try changing the argument of the function to 1 or 3 to verify the probability of values lying within 𝜇±𝜎 and 𝜇±3𝜎

The Standard Normal Distribution

  • It’s a normal distribution with 𝜇 = 0 and 𝜎 = 1, commonly represented by 𝑁(0,1)
  • We can convert any normal distribution to standard normal distribution i.e 𝑁(0,1) by taking the data points (say 𝑥)
  • z=(𝑥-𝜇)/𝜎
  • This process is called standardizing & the value of z is called a z-score
Standardization of Normal Distribution

We can then calculate the probability from its probability density function (PDF) by integrating the function to find the area under the curve.

For example, X is a random variable representing weight (in kgs) of a randomly selected citizen of a country. Assume that X is normally distributed with a mean of 60 kg and standard deviation 15 kg, represented by 𝑁(60,15).

Now we want to calculate the probability that a randomly selected person has a weight below 50 kgs.

  • Let’s start by calculating the z-score: z = (𝑥-𝜇)/𝜎 = (50–60)/15 = -0.667, x=50 for 𝑁(60,15) is equivalent to z=-0.667 for 𝑁(0,1) or the standard normal distribution
  • Probability of P(Z≤z) = 𝛷(𝘻) , We will get the probability value which is nothing but the area under the curve, from the linked normal distribution table (or rather standard normal table)
  • The area under the curve is found by the following integration performed under the hood :

Area = (-∞, 50] ⨛𝑓(𝑥).𝒹𝓍

where, 𝑓(𝑥) = 𝓮𝑥𝑝[-(𝑥-𝜇)²/2𝜎²]÷√(2𝜋σ²); 𝜇=60, σ=15

Probability(randomly selected person has weight ≤ 50 kg)
  • From the table, P(Z≤-0.67) = 1 — P(Z≥ 0.67) = 1–0.74857 = 0.25143
  • The first decimal point corresponding to Z in the standard normal table is taken from the row header and the second decimal point is taken from the column header
  • Therefore, Probability that a randomly selected citizen will have a weight less than 50 kg is 25.14%
  • Now let us do the same calculation using the cumulative distribution function available in scientific python
  • The output will be something like below.
Probability that a randomly selected citizen will have a weight less than 50 kg is 25.25%
  • Probably you have already guessed it right, the difference in calculations is due to exact arithmetic in python versus the two decimal point approximation in the standard normal table.
  • DIY: Find the probability that a randomly selected citizen will have weight between 50 & 75 kgs? Using both standard normal table and python. Hint: You have cumulative area, you will need to find the net area lying between two values. The answer should be around 59%
Area = [50, 75] ⨛𝑓(𝑥).𝒹𝓍
  • You can use the linked simulator to check your answers. Python does it in a jiffy, however it’s important to know what goes inside.

Hope you found this post useful. Do share your views in the comments section. In the next post, we will cover some important applications of normal distribution, namely the CLT (Central Limit Theorem) and Hypothesis Testing using sample statistics & sampling distributions.

--

--

Sonik Mishra

Artificially Intelligent | Finance Professional | IIM Indore | NIT Rourkela