Hard Problems in Data Science: Replication

In the first of four informal discussion sessions on Hard Problems in Data Science, Professor Daniel Lakens from TU Eindhoven and Professor Arjen Witteloostuijn from Tilburg University debated on ‘Replication, falsification and better ways of doing science’.

Tackling the replication crisis

The replication of experiments is an essential part of scientific research in which significant theories are based on reproducible results. But are they reproducible? Already a long-standing problem, the replication crisis came to a head several years ago in the field of psychology. Professor Lakens believes we should ‘look at a different field — such as psychology — and see how we tried to fix things, as a valuable learning for future studies.’

How much data do we need?

Lakens criticizes those who say they don’t care about effect sizes in their research. Because, he says: ‘If we don’t plan for the change we want to see, hypotheses become unfalsifiable. It’s essential that you ask yourself: What is the smallest effect size of interest.’ If not, Lakens explains, you’ll never have enough data. ‘By making choices about which effects you can study, you’ll have enough data if it fits your study.’

Can data science help in solving the replication crisis?

Professor Witteloostuijn is clear: replication issues plague all social sciences. He believes the practices of researchers themselves, as well as those of the universities and journals are to blame. On the one hand, there are the journals that insist on publishing only groundbreaking findings. Then, there are the universities with incentives based on publication activities. And thirdly, there’s the researchers themselves who — under the pressure to produce significant results — might come to a point where sloppy research practices are the only way out.

Wittelloostuijn believes that data science could help solve the replication crisis by providing them with a more dynamic environment in which, importantly, publishing results are no longer dictated by the quarterly deadlines of leading journals. Wittelloostuijn cites four other major benefits of data science for the social sciences: more power, easier replication, sequential sampling and endless repositories.

But, Wittelloostuijn also stresses the pitfalls, including:

Blind data-mining — In such a vast sea of data it’s easy to want to jump in as quickly as possible, without taking the time to reflect on what it is you’re looking for.

  • Artificial correlations — Do we really know what’s happening or are we just clustering data?
  • Garbage in, garbage out — Bad data gives bad results.
  • Data-merging difficulties — There’s a huge amount of data that is not properly labeled or not labeled at all.
  • Accessibility — A vast amount of data is lost simply because we don’t know how to access it or — as is the case in the social sciences — is not open.

JADS — taking on a pioneering role?

Both Wittelloostuijn and Lakens agree that JADS could play a pioneering role helping scientific research change for the better. How? Not by limiting research, but by making the process more transparent. By classifying what happened when — instead of being totally stuck on ‘significant’ as is the case now with journal editors. ‘To be able to do so, it’s imperative that JADS embraces the adage: A good theory is a quantified theory,’ Lakens stresses.

Wittelloostuijn would like to see JADS take the lead in establishing new research guidelines, based on open science, open access data and data mining guidelines. Witteloostuijn believes ‘JADS is in a unique position to set up a replication lab and theory center with a data handling team that can put in place accountability mechanisms.’

Professor Daniel Lakens is Associate Professor at the Department of Human-Technology Interaction TU Eindhoven. He publishes on how to design and interpret studies, applied statistics and reward structures in science.

Professor Arjen Witteloostuijn from Tilburg University is specialized in business, economics, public administration, psychology and sociology. He regularly advises business sand government. He is member of the Royal Netherlands academy of Sciences.