Image for post
Image for post
Image adapted from: Student (1908). The probable error of a mean. Biometrika, 6(1), 1–25. doi: 10.2307/2331554

Scenes from the Replication Crisis

by John Borghi

John Borghi
Oct 9, 2016 · 19 min read

“The value for which p = 0.05, or 1 in 20, is 1.96 or nearly 2 ; it is convenient to take this point as a limit in judging whether a deviation is to be considered significant or not. Deviations exceeding twice the standard deviation are thus formally regarded as significant.”

This will become a point of contention in quantitative methods classes and psychology research papers for nearly a century, but Fisher’s p-value does not translate to a probability that the null hypothesis is true. To return to our practical example, a p-value of 0.05 does not indicate that that there is a five percent chance that the fertilization method has no effect on grass yield. Rather, assuming the fertilization method has no effect, the probability of Fisher and colleagues obtaining their specific yield measurements is just five percent. Fisher is careful to tell his colleagues that the 0.05 p-value does not mean there is a 95% chance that the fertilization method has an effect on the growth of grass. It just means their observed yield measurements would be very unlikely if that weren’t the case.

“It would be unfair to close with the impression that the malpractices discussed here are the private domain of psychology. A few minutes of browsing through experimental journals in biology, chemistry, medicine, physiology, or sociology show that the same usages are widespread through other sciences.”

Working independently from Sterling, Jacob Cohen, a quantitative psychologist at New York University, discovers power analysis. Recall that, within Neyman and Pearson’s framework, power refers to the probability of a test correctly rejecting the null hypothesis. Power is derived from three factors: sample size (the number of data points collected in a study), effect size (the magnitude of the phenomena being studied), and the significance criterion (e.g. p<0.05). Combing through almost 80 articles published in a high profile psychology journal, Cohen determines that, while many show “statistically significant” results, they are almost universally underpowered.

“When my jaw was clenched and my brows down, I tried not to be angry but it just fit the position. I’m not in any angry mood but I found my thoughts wandering to things that made me angry, which is sort of silly I guess. I knew I was in an experiment and knew I had no reason to feel that way, but I just lost control.”

Laird revisits this experiment throughout his career, using the manipulation of facial expressions to separate research participants into two groups: those who define their emotional experience based on personal cues and those who define it based on situational situational cues. His lab will eventually extend their investigation to measures of pain tolerance and physiological measures like heart rate and blood pressure.

“The present results suggest that there is room to improve reproducibility in psychology. Any temptation to interpret these results as a defeat for psychology, or science more generally, must contend with the fact that this project demonstrates science behaving as it should.”

As influential psychological models like ego-depletion also get swept up in the wave of failed replications, conversations about psychology in both the scholarly and popular press increasingly reference p-hacking, “false positive psychology”, and “researcher degrees of freedom”. Though these terms are not strictly synonymous, they all describe behaviors that increase the probability of erroneous results entering the psychological literature. After eighty years of debate about the utility of Fisher’s p<0.05 criterion, the effects of Sterling’s “publication policy”, and the overall integrity of psychological research, the trauma of the ongoing “replication crisis” causes the research community to begin re-evaluating its methods, priorities, and assumptions.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store