Why is Normal Distribution Bell Shaped?

Rishi Sharma
6 min readJan 12, 2019

--

…..What this article is about?

If you are into Data Science and Machine Learning it is very likely that you have come across the term Normal Distribution. This article will give you an intuition about the origin of the normal distribution and why does the distribution look like a bell shaped curve. The logic will be explained using a random experiment of coin toss. Although the derivation is a little “unconventional” but it will make the explanation more comprehensive.

…..Prerequisites

· Basic Statistics

· Probability Theory Basics

…..What is the Normal Distribution? (Optional)

The Normal Distribution is a bell shaped curve that looks like the figure shown on the left. Now for a person who is new to statistics, these graphs won’t ring any bell (Even though they are bell shaped :P). If you are already familiar with the concept of Normal Distribution, you can skip this section. Now, to explain what these curves are and what they represent, let’s conduct a random experiment.

Let’s say we are given a data set that contains the percentage of marks scored by 1000 students. These are quantitative values ranging from 0.00% to 100.00%. It means that the marks are continuous and could be of any value such as 92.72% or 64.25% or 9.34% etc. Now we have a total of 1000 samples in our data set. To reduce this data, we can group the data by creating slots for the marks that are obtained by the students such as marks from 0 to 10 go in slot 1, marks greater than 10 to 20 go in slot 2, marks greater than 20 to 30 go in slot three and so on. This process of creating slots is called binning. Binning is just another fancy word for grouping and what we are basically doing is binning/grouping our values into various slots and taking the count of the values that lie within that slot (In other words we are about to make a histogram using a frequency distribution table shown below)

Now if we plot the histogram of the data, we see a crude shape of the normal distribution taking place. The figure shown below is the histogram of our data.

Now you may say that this might be because we chose our data like this and therefore the shape is like that of a normal distribution. But let me assure you that this pattern will be observed always even if we repeat the experiment infinite number of times. Also this is applicable to any natural or social science experiment that we conduct such as the distribution of height of the entire population of a city, or the distribution of average salary a person earns in a state, or the distribution of purchase of mobile phone handsets of various prices. All of them will resemble a normal distribution. Not exactly but close enough. Keep in mind that we are not concerned with the individual quantitative value (i.e. the height, salary, price, etc) but in fact we are looking at the frequencies/count of the bins in which these individual values lie. So the normal distribution only makes sense for histograms.

…..Why the Bell Shape?

Now let’s try to find out why the normal distribution is bell shaped and what causes the hump in the middle. Before we proceed, I would like to point out again that this is a rather “unconventional” approach to the explanation.

I will make use of Bernoulli’s distribution, which is a special case of normal distribution to find out how the hump in the middle originates. Let’s say we are conducting an experiment with a coin and noting down its outcomes. We also have the privilege of repeating the experiment many number of times.

Let’s say that we are conducting the experiment once. So let’s take a variable n denoting the times we are repeating the

experiment. So for n =1, we have two outcomes for the coin, either heads(H) or tails(T). Both of them have an equal probability of occurring that is 0.5 each or 50 percent. So if we plot the outcomes vs probability distribution this is what we will get as shown in the histogram.

Now lets increase the coin flips to 2 i.e. n = 2. For two coin flips our number of outcomes also increase. Earlier for 1 coin flip we had two outcomes that is either heads or tails. Now for two coin flips, we have four outcomes that are either both heads(HH), both tails(TT), first heads then tail(HT) and first tails then heads(TH) as shown with the respective probabilities in the table.

Now notice that HT and TH are the same outcomes but the only thing that differs is the sequence in which we get these results. Hence we can club these two outcomes to a single outcome that is HT with a combined probability of 0.5 (0.25 + 0.25).

So the new Outcome table is as shown with the outcome vs probability distribution. We can clearly see the formation of the hump starting to take place. It is the clubbing of similar outcomes that causes the probability of the middle outcomes to increase. Hence generating the hump.

Now let’s again increase the repetition count to 3 (n=3). Now we have a total of 8 outcomes as shown with the respective probabilities. Now notice that HHT,HTH,THH are the same outcomes only differing in the sequence. Similarly TTH,THT and HTT are

the same. These two groups can be reduced to 2 individual groups HHT and TTH with the sum of their probabilities. The new outcome table now becomes as follows along with

the probability distribution of the outcomes.

We can again see that there is a hump in the middle. In fact, if we keep on repeating the experiment, our outcomes are bound to increase and out of those outcomes, we have a group of outcomes that in fact are the same if we ignore the sequence in which they are obtained. Hence the probability of obtaining these outcomes tend to increase and this causes the generation of the hump in the normal distribution. For a large value of n we get the normal distribution. We can see from the below histograms that as n increases, the formation of bell shape starts to take place.

--

--