“The Law of large numbers” Vs “The Law of TRULY large number”

Chamodi Adikaram
Zero Point Zero Five
4 min readMay 22, 2017

At a glance you might not see much of difference between these two names except for the word “truly” at the latter theory. Well in reality it’s quite the opposite. It does have a huge impact on the study.

First let me explain the law of large numbers. This is one of the most fundamental principles in discipline of mathematics, specially in the area of probability.

In any given study a researcher cannot observe the whole population in the world. The researcher will take a fraction of the population as its study subjects. For an example if one wants to study the correlation between the exercising and the level of cholesterol level, he/she will choose a set of subjects from the researcher’s patients, some institute or covering a specific area at his/her convenience. Here, in order to study the general correlation between these two factors of the whole population, a particular segment is extracted. Based on what ever the results yielded by this study, the research group will determine the correlation of the exercising and cholesterol level of the whole population despite the institute or region. Covering the whole population for any given study is nearly impossible. Therefore researchers try to make the test subjects as diverse as possible with regard to facts like demographic values.

It is obvious that the more subjects a researcher will be able to cover in a sample subject set, the more relatable it will be for the whole population. This is simply the law of large numbers.

Britannica defines this as

“Law of large numbers, in statistics, the theorem that, as the number of identically distributed, randomly generated variables increases, their sample mean (average) approaches their theoretical mean.”

Here is a much simple definition I found :

“A “law of large numbers” is one of several theorems expressing the idea that as the number of trials of a random process increases, the percentage difference between the expected and actual values goes to zero.” (source:http://mathworld.wolfram.com/LawofLargeNumbers.html)

In my opinion, the “Decline effect” is closely related with the law of large numbers. (I shall write separate article on decline effect, until then if you are too eager to learn this visit http://www.newyorker.com/magazine/2010/12/13/the-truth-wears-off )

Now lets move on to the “law of truly large numbers” which is a.k.a a statistician’s definition of coincidence.

This law says us that, larger the sample size, the more spurious correlations it will show.

Here is a simple example:
There are studies that claim there is a correlation between Stork sightings and population increase (Box, Hunter, Hunter, 1978 in Statistics for Experimenters and https://web.stanford.edu/class/hrp259/2007/regression/storke.pdf) (Sorry to break it to you that stork ain’t gonna bring you babies and making babies takes a whole different procedure)

As funny as it sounds, this is actual data. But the correlations are quite deceiving unless studied carefully. This results can simply explained as in more densely populated areas there’s more people to actually see the storks and so you get an increase in sighting. But there are much weirder coincidence than the storks and birth rates.

One of my favourite example is that lost wedding rings found inside fish. There are many interesting stories as to how these lost rings coming back to the owners inside the fish they about to consume. Actually the number of stories is quite considerably high.

Here is how this works:

Suppose 100 people lost their wedding rings and 2 people found them inside fish. The number is 2 and percentage is 2%. Now suppose 1000 people lost their rings and only 20 people found it inside fish. Well the percentage still remains 2% but number has increased to 20 which is quite noticeable. As the number of people who loses their rings increases the latter will increase too. But what’s more noteworthy is only the incidents where the rings were found inside the fish. Thus it’s addressed as coincidence. If you actually take a count of the people who lost their rings from the whole population and compare with the fraction who found it in the fish it will be remarkably high.

(Here are few more intersting facts on coincidence https://www.youtube.com/watch?v=sJw3L_D0vD4&t=12s)

This is the contrast between these two laws in laymen’s terms.

As a researcher if you want to adhere to the law of large numbers and also want to stand away from the law of truly large numbers, my suggestion is to obtain the domain knowledge so that you can apply prior knowledge in your study model. Sometimes you can’t simple let the data to speak for themselves because sometimes they can talk nonsense. ;)

--

--