The larger the study, the greater the chance of nonsense

Cecile Janssens
Press Pause
Published in
4 min readJan 26, 2020

--

Photo by chuttersnap on Unsplash

The media response to the controversial genetic study of sexual orientation had barely died down when the renowned journal Nature Communications published a new study about the genetic influence on income. This was another large-scale analysis using data from the same British UK Biobank, 285,000 participants this time, with similar unsurprising results: DNA also has a minimal impact on income differences.

I often don’t understand why mega-studies receive so much attention. Large studies may seem credible, but their credibility is not self-evident. Studies with tens or hundreds of thousands of participants, such as the UK Biobank and the US Framingham Heart Study, or registries, such as the IBM Marketscan claims database, are valuable data sources, and their size is a blessing for statistical analysis. But that alone does not guarantee meaningful results.

Every study is just a simple snapshot of a complex reality. A dataset is a sample that researchers use to make statements about the population from which the sample was drawn. This simplification can lead to both random and systematic errors. A larger study reduces the chance of random errors, but not of systematic ones.

Statistical precision

Random errors occur randomly. You find them in one sample but not in another…

--

--

Cecile Janssens
Press Pause

Professor of epidemiology | Emory University, Atlanta USA | Writes about (genetic) prediction, critical thinking, evidence, and lack thereof.