Inhaling data

Martin Vetterli
Digital Stories
Published in
3 min readDec 18, 2018

Where we finally understand the link between Nobel prizes and chocolate, and between correlation and causation

Photo by Charisse Kenion on Unsplash

Back in high school I often used to hitchhike (you know, this free ride sharing method before Uber). Once, a salesman for a tobacco company took me with him, and sure enough, we started arguing if smoking cigarettes causes lung cancer. His argument was simple: in Switzerland the number of babies per family and the number of storks had declined since the Second World War. So does that mean storks brought babies?

Of course not, and in much the same way, he saw no link between the increase of tobacco consumption and lung cancer. It was only a correlation, he argued, but there was no cause-and-effect relationship. As absurd as it sounds, he was making a valid argument. A correlation between two phenomena, such as the number of babies and storks, does not imply that there is a causal relationship.

A few years ago a doctor published a tongue-in-cheek article with a similar logic. He showed a clear correlation between a country’s level of chocolate consumption — and its number of Nobel prizes! In his data he showed that countries with more chocolate consumption had more Nobel Laureates. For example, the chocolate consumption in Switzerland is about 12 kilograms per year per person and Switzerland had about 33 Nobel Laureates at that time of the analysis. On the contrary, countries like France or Italy, which only had 5–10 Nobel Laureates each, consumed less than 5 kilograms of chocolate per year per person.

Clearly, the article by the doctor was ironic: a simple data correlation between two phenomena does not prove any causation, since two events can simply occur together without a clear cause-and-effect relationship (such as storks and babies). However, one cannot completely rule out that chocolate influences intelligence, even if it could as much be the other way around (meaning that having many Nobel Laureates in a country leads to a high chocolate consumption). Causality can thus clearly be an explanation of an observed correlation.

But it’s not the only one. In the case discussed above, it is much more probable that both phenomena, chocolate consumption and Nobel prizes, are simply influenced by a third factor, such as the richness of a country. This would exclude a direct correlation between the two.

So what about cigarettes and lung cancer? Today we know that smoke contains a plethora of carcinogens. These are chemical substances that damage the cells inside the lung, and by doing so increase the risk of lung cancer. Thus, for cigarettes and lung cancer a clear causality was found. But before this was known, it was just a pure correlation, too. And in fact, Ronald Fischer, one of the most famous statisticians of the 20th century, argued against a causal link (exactly like the salesman in the car) — and all this while calmly smoking his pipe.

--

--