Is Psychology Facing a Reproducibility Crisis?

Courtney Ells
Psyc 406–2016
Published in
4 min readMar 20, 2016
Image: https://www.timeshighereducation.com/news/psychology-reproducibility-crisis-overstated-new-report-claims

In recent years, psychology, like many other scientific disciplines, has come under fire for what appears to be a lack of reproducibility of published results, including those in what are commonly referred to as the “high-impact journals”. This was brought to the limelight last year by Brian Nosek and Open Science Collaboration’s paper “Estimating the Reproducibility of Psychological Science”, published in Science in August of 2015. Not only did this publication represent a colossal undertaking involving more than 250 researchers and 100 studies, but some of it’s findings were not easily swallowed by the scientific community, specifically those studying psychology.

Firstly, this paper showed substantial diminutions in effect size and significance of results between the original studies and their replications. They reported that “Ninety-seven percent of original studies had significant results (P<0.05). Thirty-six percent of replications had significant results”. The declines in effect size were a little less staggering, but still were not a positive outcome for the field. They also found that surprising effects and effects that were more challenging to conduct research on were less reproducible and that the characteristics or expertise of both the original and the replication teams had little effect on replication success. Overall, the best predictor of replication success was the original strength of the evidence, such as the P-value or effect size. One of their most controversial findings though was that “reproducibility was stronger in studies and journals representing cognitive psychology than social psychology topics”.

Many of these conclusions have been justified or explained further by other experts in the field. For example, the finding that cognitive science experiments were twice as likely to replicate as social psychology experiments may relate directly to their research designs and contexts. Cognitive experiments tend to deal with very repeatable phenomena and use within-subject designs whereas social psychology tends to examine factors that vary substantially between people and use a between-subject design. Both of these factors could contribute to cognitive experiments showing a much larger overall effect size.

It was also questioned by Jason Mitchell in The New York Times “what percentage of studies should replicate”? If the intent of science is innovation, shouldn’t some studies be incorrect or need modifying? In that case, is 36% a good margin or not and how do we go about determining this?

After this paper was published, opinionated scientists took to one of two camps. One side backed the study, pursuing many of the suggestions made in its discussion section and pushing for more transparent practices within the discipline. The other was, understandably, up in arms. This paper, and other recent allegations, were challenging not only their field of study, but also their livelihood. In recent months, the debate has remained heated, but has settled enough for many informative, insightful conclusions to be drawn from both sides which I think could serve only to reinvigorate and further establish the field of research psychology.

“Our problems are not small and they will not be remedied by small fixes. Our problems are systemic and they are at the core of how we conduct our science.”

Michael Innlicht- Professor of Psychology, University of Toronto.

Many scientists are now thinking critically about what can be done to revolutionize and revive the way we “do” science. A major push that is coming about is the public pre-registration of research plans. This forces scientists to state upfront what their hypothesis’ and testing plans are, making it much more apparent if any after-the-fact data manipulation or cherry-picking has taken place. Other suggestions have included using larger samples (possibly via joint studies), describing methods more clearly, and increasing transparency by making use of open source databases.

Though still controversial, Novek’s report and the “reproducibility debate” seem to be fuelling a positive step forward for not only psychology, but many other scientific disciplines which are following suit. By increasing the clarity of reporting and focusing on transparency, it seems possible that we can maintain the innovative intents of scientific research while also ameliorating it’s reliability.

This would have direct implications when it comes to testing, needing to be considered from the earliest stages of test design. With the advent of pre-registration of research plans, hypothesis’ and even test design and analysis plans would be outlined in advance, leaving little room for post-study modification. It could also be extremely useful for test designers to have access to open source databases to access not only other’s experimenters’ tests and items, but also their statistical designs and procedures. Though we could be facing the likelihood of having to report more negative results, instead of the current push for only positive ones, I believe all of these modifications are constructive steps forward towards reporting science that is both inventive and more importantly, reliable.

Sources and Further Insight on the Subject:

--

--