# Is everything correlated to everything?

## A note on the word ‘significant’.

First let me clarify for the lay person that ‘significant’ is often used in a misleading way by scientists and journalists when communicating with the public. In plain English it means it is consequential, of considerable importance, as in ‘his salary was raised, but but not by a significant amount’. But in statistics as an academic discipline, it means nothing of the kind. Rather it means, ‘it was probably not entirely due to chance, i.e. was probably not completely misleading’. The ‘significance level’ is the probability that that the result was purely due to chance. This causes the public to completely get the wrong end of the stick.

Thus the scientist or journalist means (supposedly) only that the results were probably not entirely due to chance, while the public thinks they mean that the results tell us something important. Is it deliberate deception? I don’t know. Tell me what you think in comments, or hit me up on Twitter (@bartshmatthew on Twitter).

The above xkcd cartoon shows what, as far as I know, no newspaper has ever done: let the reader in on the fact that ‘significant’ is being used in an unfamiliar sense.

## How everything is correlated to everything.

The xkcd cartoon ignores sample size. I’d argue that had the fictional scientists used large enough samples they would have got correlations for every colour of jelly bean.

It seems to me that everything is statistically correlated to everything else, either positively or negatively, and if I am right about this, a lot of statements made by scientists (especially in the human sciences where a correlation is the only evidence) are essentially worthless.

It seems odd to me that when a correlation is found, the fact that a large sample (i.e. many subjects) was used is mentioned approvingly, as if this makes the finding more consequential. In fact, for a given statistical significance level, the larger the sample the *weaker* the effect (or correlation) might have been.

It seems to me that with a large enough sample, you can find a statistical link (i.e. correlation) either positive or negative between any two macroscopic phenomena on planet earth, and possibly beyond.

For example, it seems to me that if you had a big enough sample size, you could show that people who were given a certain vaccine rather than a placebo were subject to a different probability of dying of *any* cause. Vaccinated people surely have a different chance of dying by suicide, by an accident, by murder, by a heart attack, by cancer, by a stroke, and so on for every way of dying. To get a statistically significant result showing any of those, you only need a large enough sample size.

A large sample size is rightly approved of when *no* correlation is found, it seems to me, not when a correlation was found. That tells you that the correlation, that surely exists, must be tiny.