Central limit theorem-One Pager

Paddy
Analytics Vidhya
Published in
2 min readMar 6, 2021

--

This article could be considered as a gentle introduction for beginners or a quick refresher for experienced.

Inferential statistics with the availability of sample data making inferences on population data. Central Limit Theoram alias CLT is a process helps us to validate this assumption. Below are the properties of CLT:

  1. Sample mean (μ¯X) = Population mean (μ) as we dont or cannot calculate population mean
  2. Sampling distribution’s standard deviation (Standard error) = σ√n Sampling distribution is a different topic altogether. Assume a mean is arrived out of sample
  3. For n > 30, the sampling distribution becomes a normal distribution

In theory(or on reality) when the number of sample count is above 30 we get a normal distribution curve

Lets calculate CLTwith an example:

Say there 20000 employees working in an organisation. we would like to calculate the average commute time of the employees. Practically impossible to calculate for everyone so we can calculate it for small number of samples say for 100 employees and infer for the population.

The average commute time for these 100 employees is 35 minutes. we can assume the population mean should be close to 35 min which is

population mean = 35 + or - error value

This sample mean + or- minus error is called the confidence level. The formula for calculating the confidence level is

Confidence interval

Sample mean = 35 (X bar)

And we got sampling distribution’s SD = 9 (S)

n = 100 (Total sample count)

we need Z score which is dependent on the confidence level defined by us. Say 95%

Since this is a 2 tail test = 0.95 + (1– 0.95)/2 =0.975

z score for .975 is 1.96

Applying these values in the formula we we get

35–1.74 , 35+1.74 which is

the average commute time of all employees will be between 33.26 and 36.74 minutes

Continue reading about Hypothesis Testing

--

--