Law of Large Numbers vs Central Limit Theorem

Pankaj Agarwal
Analytics Vidhya
Published in
4 min readMar 16, 2020
Collecting Samples & analysing them are fundamental aspects of Statistics [Photo: Pop & Zebra on Unsplash]

Prerequisite — This blog assumes that the reader has some basic idea about Central Limit Theorem (CLT). If you are new to this topic then here is a short and beautiful primer on CLT. The aim of the blog is to compare and justify the similarity and differences between Central Limit Theorem and Law of Large Numbers. Also, it highlights the fact that CLT only provides the information about the shape of the sampling distribution. The mean and variance are given by the linearity of Expectation.

In Statistics, the two most important but difficult to understand concepts are Law of Large Numbers (LLN) and Central Limit Theorem (CLT). These form the basis of the popular hypothesis testing framework. Key decisions in any internet based company ( ex: e-commerce, food delivery, OTT, etc.) are usually backed by an A/B test which involves hypothesis testing.

In the practical world, its impossible to anaylze an entire population. Hence, we resort to sampling from the population and its analysis. We try to draw conclusions about the population based on the sample.

As per wikipedia, The law of large numbers as well as the central limit theorem are partial solutions to a general problem: “What is the limiting behaviour of sample mean (S_n) as sample size (n) approaches infinity?”

We set up the problem by defining Population Distribution, Sampling Distribution and deriving some statistics about the sample:

Population Distribution:

We start with a theoretical population. It can be of any shape. It can discrete (say Bernoulli, Poisson, etc.) or continuous (say exponential, uniform, etc.) Let the mean and variance of this distribution be µ and σ².

Sampling Distribution:

Now we select n samples from this population independently ( in statistical parlance — we select n iid samples) and average them up. Let’s call this random variable Y. This is a random variable because we can have many such samples or many instances of Y. We can repeat this procedure infinitely many times. The distribution of Y is called the Sampling Distribution. Then —

Defining the sample formally using random variables

Let’s try to find two important characteristics of this random variable i.e. the Expectation and Variance.

From Linearity of expectation, we get —

The formula for expectation holds true even if Xi’s are not independent. However, the formula for variance holds true only when Xi’s are independent.

The formula for variance of Y can be derived using Linearity of Expectation rule. The formula is known as Bienaymé formula.

Hence, we know about the expected value and variance of our new random variable. But we don’t know anything about the shape yet. This is where CLT kicks in.

Central Limit Theorem states that:

The Sampling Distribution is approximately normally distributed if the sample size is large enough ( say > 30). This can be observed easily using Monte Carlo Simulation.

Source — Wikipedia ; CLT can be summarized by this one picture.

Three cases arise depending on the population distribution and sample size:-

Case — 1.) If the population distribution is Normal — Even a sample size of 2 would result in a Normal Sampling Distribution. The sum of any number of iid N(0,1) random variables is exactly normally distributed. This can be proved using many ways and one of them is using convolution.

Case — 2.) If the population distribution is not Normal and the sample size is large ( say > 30), then the resulting sampling distribution is approximately normal.

Case — 3.) If the population distribution is not Normal and the sample size is less than 30, then the resulting distribution can be better modeled by Student’s t-distribution instead of a normal distribution.

Law of Large Number states that:

An instance of sample mean (Y) of size n tends to be closer and closer to the population mean µ as n → ∞.

Since we anyways know the shape from CLT and the formulas for mean and variance of ‘Y’ from Linearity of Expectation, the LLN becomes redundant. Easily we can see that any instance of ‘Y’ lies in the bell shaped curve and as we increase ‘n’ the curve becomes thinner and thinner. Thus as ‘n’ → ∞, any Y will approximately be equal to µ.

Conclusion:

LLN and CLT both try to approximately tell us the behaviour of the sample mean. CLT gives us the approximate shape of the distribution. The linearity of expectation gives us the Expected mean / Variance of sampling distribution. The LLN just talks about the approximate value of sample mean which of-course becomes closer and closer to the population mean as ‘n’ becomes large.

Note — There are many different versions of CLT and LLN. But for general understanding purposes I have omitted all the jargons. Hope this article clarifies some of your doubts.

References

--

--

Pankaj Agarwal
Analytics Vidhya

Building Search and Recommendation Systems at Myntra !!