Central Limit Theorem In Statistics

Farhan Tanvir
Open Physics Class
Published in
6 min readNov 21, 2021

In this article I will discuss about Central Limit Theorem intuitively with examples and also give an intuition behind the central limit theorem formula.

Let we have a large number of population and we have a probability distribution of their daily income. According to the statistics, about 40% of them earn on average 100$ per day, 20% of them earn 200$ and the rest 40% of them earn 300$ per day.The following picture shows the probability distribution of the daily income of the population.

Clearly, this is not a normal probability distribution. The expected value of the probability distribution is the average of the daily income (or mean) of all persons from the population. According to the law of population mean, the average daily income of the population is,

μ = Σ x*p(x)

μ= 0.4*100 + 0.2*200+0.4*300 = 200

So, on an average each people has a daily income of 200$.

Let, we take a sample of n=5 people and take the average of their daily income. You might think that as 40% of the actual population have a daily income of 100$ so, exactly 40% of the people from the sample will have a daily income of 100$.

40% of 5 people = 5*0.4 = 2

So, 2 people should have 100$ daily income. Similarly , 20% ( = 1) people will have 200$ daily income and the rest 40% (= 2) people will have 300$ daily income. But this may not be the case. May be all of the 5 people will have 100$ daily income or we may get 3 people with 200$ daily income and 2 with 300$ daily income.

Let, we took a sample of 5 people from the population and got the following result :

300$, 200$, 200$, 100$, 300$

The average of their daily income is (300+200+200+100+300)/5 = 220

Earlier we calculated the whole population mean that is, the average daily income of the whole population = 200. Our sample mean = 220 is not equal to the population mean but the two values are close. Let we took another sample of 5 people and took the average of their daily income , this time we got the mean = 180. If we keep taking samples and calculate mean for each cases, we will get many different values as sample mean. The lowest value will be when all of the 5 people have 100$ daily income , then the sample mean=100 . The highest value will be 300 when all of them have 300$ daily income. So, if we keep taking samples we will get values between 100 and 300

Let we repeated this process 100 times, that means we took 100 samples of 5 people and each time calculated their mean. Then we made a probability distribution for these sample means. The following picture shows the result of the real experiment using a simulator. I took 100 samples of 5 people and each time calculated their mean daily income and then plotted on a probability distribution curve :

This figure looks similar to a normal probability distribution though it is not a normal probability distribution. This distribution is also called the sampling distribution of the sample mean. Because the distribution is constructed by taking sample means. One thing you may notice from the figure that, only a few of those samples had a sample mean of 100 or 300. That means it is a rare case that we will get all 5 people having 100$ daily income or 300$. Because only 40% of the total population earn 100$ a day. So, if we select randomly 5 people from the population , it is very unlikely that all 5 of them will have 100$ daily income. In most of the cases we got a sample mean close to the actual population mean ,200. That is why the probability distribution looks like a normal probability distribution.

Now we will repeat this process 10000 times instead of 100 times. That means, we will take sample of 5 people and repeat this process 10000 times and construct a probability distribution for the sample means. The following picture is the result of a real experiment :

This figure looks more similar to a normal probability distribution. It turns out that if we keep taking more samples it will be closer to a normal distribution curve, though the actual population distribution that we saw first is not a normal probability distribution at all.

Now we will increase the sample size. Instead of taking 5 people in sample, we will take sample of 10 people , then we will do the same experiment as we did before. The following figure shows the result of doing the experiment with sample size = 10 and we repeated this 10000 times.

This looks almost similar to a normal probability distribution. Also, the distribution is narrower than the previous ones.

Now we will do the same experiment with sample size = 100.

This is a perfect normal probability distribution. You may have noticed that the mean of the distribution μₛ is equal to the actual mean of the whole population μ =200 that we calculated earlier.

Also notice that, the distribution curve is very narrow compared to the previous figures. Because as you increase the sample size the sample means will be closer to the population mean. It is almost impossible to find a sample of 100 people in which all of them have a daily income of 100$. So, we did not find any sample with mean = 100 or 300. This means, as we increase the sample size, the standard deviation of the sample means will be lower.

The standard deviation of the sampling distribution of the sample mean σₛ decreases as the sample size is increased

We started with a probability distribution that is not a normal distribution at all and now we have found a normal probability distribution.

From the above discussions we can come to the following conclusions :

  1. If we start with a probability distribution that is not a normal distribution and we take samples from the population , take mean and repeat this process many many times then the means will form a normal probability distribution. The more we take samples the more the distribution will look like a normal probability distribution.
  2. As we increase the sample size, it will make the distribution more like a normal probability distribution. The mean of the distribution μₛ will be equal to the population mean μ. The standard deviation of the distribution of sample means will be smaller as we increase the sample size.

The Central Limit Theorem

The central limit theorem states that if you have a population with mean μ and standard deviation σ and take sufficiently large random samples from the population then the distribution of the sample means will be approximately normally distributed. This will hold true regardless of whether the source population is normal or skewed, provided the sample size is sufficiently large.

The standard deviation of the sampling distribution of the sample mean , σₛ increases if the population standard deviation σ increases. σₛ decreases if the sample size is increased. I will not give the proof the following formula for σₛ but you already have intuition of this if you have read this far.

The standard deviation of the sampling distribution of the sample mean ,

σₛ = σ/√n

From this formula you can verify that as the sample size n increases , the standard deviation of the sampling distribution of sample mean increases.

--

--