New paper: “Why most of psychology is statistically unfalsifiable”

Just submitted with Daniël Lakens

Richard D. Morey
Oct 16, 2016 · 3 min read
Image for post
Image for post

Daniël Lakens and I have been working on a new paper [pdf] that serves as a comment on the Reproducibility Project: Psychology and Patil, Peng, and Leek’s (2016) use of prediction intervals to analyze the results. Our point of view on the RP:P echoes Etz and Vandekerckhove (2016): neither the original studies nor the replications were, on the whole, particularly informative.

We differ from Etz and Vandekerckhove in that we use a straightforward classical statistical analysis of differences between the studies, and we reveal that for most of the study pairs, even very large differences between the two results 1) cannot be detected by the design, and 2) cannot be rejected in light of the data. The reason is, essentially, that the resolution of the findings is simply lacking. Even high-powered replications, by themselves, will not help to assess the robustness of the psychological literature, because the original studies are so imprecise that one cannot call them into question with a new set of results.

In light of this fact, all the discussion of moderators as a possible explanation for failures to replicate is over-interpreting noise. There might be differences between the studies. These differences might be small, and they might be large. For the vast majority of the studies, we just don’t know.

Image for post
Image for post

This has dramatic implications for the cumulative nature of science, because the logic of learning from a replication, and asking whether perhaps moderators can account for the difference, is no different from learning from non-replications. Do two studies seem to show a different pattern of results? Is one significant, and the other not? Have you ever written a discussion section that explains such differences? Like with the RP:P, if sample sizes are small, there will often be large differences between studies, even when there is no difference (or a small one). I suspect — and I know others do too — that much of the theorizing that happens in psychological science is interpreting noise.

Image for post
Image for post

The ultimate culprits are publication bias combined with common misconceptions about power, which we address in the paper. We also suggest a way of powering future experiments: power your experiment such that you, or someone else, can conduct a similarly-sized experiment and have high power for detecting an interesting difference from your study. We need to stop thinking about studies as if they are one-offs, only to be interpreted once in light of the hypotheses of the original authors. This does not support cumulative science.

Other things to note:

Let us know what you think! We’re on twitter as @richarddmorey and @lakens, or you can contact me via the email in the manuscript.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store