Data Science 101: Explain the Central Limit Theorem to a five-year-old

Mark Stent
3 min readMar 29, 2023

--

Understand this essential statistical concept…in its simplest terms.

Photo by Patrick Fore on Unsplash

Even seasoned professionals forget the basics, and I often find it useful to go over things I have forgotten or not used in a long time. So in my ‘Data Science 101’ series, I aim to cover simpler topics that are very useful!

The central limit theorem is a really important concept in the world of statistics. It says that if you add many numbers and calculate their average by dividing the sum by the total number of numbers, the answer will usually look like a bell-shaped curve. That means most averages will be in the middle, and fewer will be really high or low.

For example, let’s say you have a big jar of jelly beans. You can count how many jelly beans are in the jar and write down the number. Then you replace the beans, shake the jar, count again, and write down that number too. You can keep doing this repeatedly, and you’ll get a bunch of different numbers. If you take all those numbers, add them together, and divide them by the total number of numbers, you’ll get an average number of jelly beans you find in the jar.

If you calculate the average number of jelly beans per jar many times, you might expect the answer to be really big or small, but it will usually be somewhere in the middle. That’s because of the central limit theorem!

Another example is flipping a coin. When you flip a coin, it can land on heads or tails. If you flip it a lot of times, you’ll get a bunch of different results. If you add up all the results and divide by the total number of flips, you’ll get an average that is probably close to the middle because it’s just as likely to get heads as it is to get tails.

In the real world, this is used in election polling. Pollsters use random samples of voters to estimate how the whole population will vote in an election. If they take enough samples, they can use the central limit theorem to make accurate predictions about the election outcome.

So the central limit theorem is a way to understand how adding up lots of different numbers and calculating their average can give you a predictable result.

Simple!!

If you would like a really good and MUCH more technical explanation of the Central Limit Theorem, please check out Suarav’s article:

If you like my articles, please subscribe:

--

--

Mark Stent

Data scientist by day, music producer by night. When I'm not nerding out over math and AI, you can find me lifting weights or solving Rubik's cubes.