Source

Introduction to Statistics for Data Science

Advanced Level — Learning about Central Limit Theorem

Super Albert
Nov 9, 2018 · 5 min read

Although the Central Limit Theorem can be set as part of the “Advanced Level — The Fundamentals of Inferential Statistics with Probability Distributions” post, it is my belief this theorem deserves a single post!

The first step of every statistical analysis you will perform is to determine whether the dataset you are dealing with is a population or a sample. As you might recall, a population is a collection of all items of interest in your study whereas a sample is a subset of data points from that population. Let’s take a short refresher!

Populations and Samples

SuperDataScience Statistics For Business Analytics A-Z

Central Limit Theorem

It is said to be the most important theorem of Statistics as well as Mathematics. It can be very powerful when assessing problems and world situations! The Central Limit Theorem states that “ the sampling distribution will look like a normal distribution regardless of the population you are analyzing”.

Sampling Distribution

As we’ve seen you take a sample to estimate the parameters of the whole population. However, not always only by sampling are you to retrieve the correct estimate of the population’s real parameters.

Instead of taking a single sample, what about if we take several samples from our population? For each sample, we’ll calculate the mean. So, in the end, we’ll have several values of mean estimation and then we can plot them on a chart.

SuperDataScience Statistics For Business Analytics A-Z

This will be called the sampling distribution of the sample mean.

Central Limit Theorem — Intuition

Let’s learn by looking at an example. Imagine we wanted to see the distribution of the heights in every male in the Portuguese population.

First we take several samples (heights of different man) from our population and for each sample group we calculate the respective mean. For example, we can have groups where the height is 176 cm, others with 182cm, others with 172cm, and so on. We then plot this sample mean distribution. The following picture depicts the distribution of our several samples with the x mark, in each, referring to the mean value.

SuperDataScience Statistics For Business Analytics A-Z

You can see that although you sampling means (red X marks) might be on the extremes of the general distribution most of them tend to be closer to the center.

In the end, the distribution of all these samples’ mean will present a normal distribution. Take a look at the last chart which is composed of the distribution of all the samples’ mean.

For only 5 samples you can already see that most of the means tend to concentrate towards the center of the sampling distribution. And amazingly, in the end, the mean of the sampling distribution will match the mean of the original distribution of the population.

So there are two certainties with the Central Limit Theorem:

Let’s look at a more visual example with this GIF. The population already showed a normal distribution but later you can try with other shapes and even draw your own. We start by taking samples of size n=5 from the population and calculating the respective mean. As you increase the number of samples with n= 5 you see that the distribution of the means starts to be shaped like a normal distribution. When we increase the process several times in the thousands we get a normal distribution with a mean equal to the population’s mean. The more samples you get the more narrower the Normal distribution will be.

Now you try it! You can literally draw your distribution on this website using your mouse. Like the one you see here.

So if we start taking samples and calculating the mean of each one of those samples then plot them in a sampling distribution then you’ll obtain a normal distribution centered on the initial mean.

Access it here:


The Making Of… a Data Scientist

Welcome to “The Making of… a Data Scientist”. This is my personal blog with all I’ve been learning so far about this wonderful field! Hope you can get something useful for your path as well!

Super Albert

Written by

The Making Of… a Data Scientist

Welcome to “The Making of… a Data Scientist”. This is my personal blog with all I’ve been learning so far about this wonderful field! Hope you can get something useful for your path as well!

More From Medium

More from The Making Of… a Data Scientist

More from The Making Of… a Data Scientist

More from The Making Of… a Data Scientist

Introduction to Statistics for Data Science

More from The Making Of… a Data Scientist

More from The Making Of… a Data Scientist

What in god’s name is Gradient Descent?

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade