Emptying psychology’s file drawer

Published in

psychphdpathway

5 min readOct 20, 2020

Psychological research, at its core, is motivated by a desire to increase our understanding of ourselves and the world we live in. As researchers, we embark on a quest for knowledge, and focus our efforts on finding the answer to a particular question. If the answer is “yes” (in its assortment of: “our hypothesis was supported”, “results were significant”, “p value <.05”), we rejoice. We are good scientists. We are worthy.

However, when our results are “not significant” or “null” (p>.05), the outcome is not always straightforward. So what are the options?

An image showing multicoloured closed drawers — Image by bs-matsunaga from Pixabay

Give up on publishing the analysis?

This is perhaps the most common response, and is often referred to as the “file-drawer problem”. In psychology and psychiatry, publications are heavily biased by over-representing “statistically significant” results, with positive findings making up over 90% of publications and a majority of researchers admitting to not reporting all results. This means that the published literature indicates much stronger evidence for findings than exists in reality. To understand how severe publication bias really is, we can look at a case where the publication of a study is not dependent on the “significance” of the results: Registered Reports. In a study of 113 Registered Reports, about half of all findings are negative and do not support the authors’ hypothesis.

When examining the published literature in a certain field, if it’s “too good to be true”, maybe we should take a minute to consider all the potential file drawers full of unpublished findings. Do we still feel so strongly about a particular hypothesis?

There are many reasons for a null result, including the lack of a true effect (i.e. the evidence contradicts the hypothesis), the lack of statistical power (i.e. the predicted effect may be smaller than expected), or random variation (i.e. numbers are noisy). Despite this, researchers hide away null results as if they are a personal failure.

By only publishing positive results, it becomes impossible to draw meaningful conclusions from the literature, and the credibility of the scientific enterprise is compromised. Researchers outside of our inner circle (i.e. those who do not know that this particular experiment does not produce “good” results) are bound to repeat the same work, and perhaps hide it away in their own file drawers. It becomes a vicious self-sustaining cycle.

If nothing else, we owe it to our participants to publish all results. In my own case, when a hopeful participant asks whether there are any studies suggesting that prenatal depression may not impact a baby’s brain development, I do not have an answer. I do not know the whole truth. Studies overwhelmingly say it does, but I cannot help but wonder how full this particular file drawer is.

Present your result as “trending towards significance”?

Matthew Hankins has collated a list of phrases that researchers have used to present values of p > .05 as trends towards significance. Some of my favourite include “teetering on the brink of significance (p=.06)”, “only slightly non-significant (p=.073)”, and “barely escaped statistical significance (p=.07)”.

Arbitrarily moving the goalpost once we have seen the results betrays a complete misunderstanding of p values and invalidates the point of having a threshold to begin with. Before describing a result as trending towards significance, we need to ask ourselves whether we have ever described a result (e.g. p=.04) as trending towards non-significance. If not, then we have our answer.

A group of cartoon figures are trying to push an arrow upwards — Image by Peggy und Marco Lachmann-Anke from Pixabay

Try to get that p value to drop under .05?

Numbers are noisy, and we are guaranteed to find “significant” results if we look hard enough. In theory, most of us understand that this is bad. However, in practice, researchers employ a wide range of strategies to get the p value to drop under the desired threshold and make their work “publishable”.

According to a poll, 3 out of 4 researchers reported deciding to collect further data after finding “unsatisfactory” results and almost half reported excluding data points from the analysis after peaking at the results. There are many other examples of questionable research practices that are beyond the scope of this blogpost, but I recommend reading Stuart Ritchie’s book “Science Fictions”.

Publish your null results

Before arguing for the publication of null results, it is important to acknowledge that individual scientists cannot be faulted for choosing not to do so. The incentive structure in academia is broken; rather than rewarding the quality of one’s work, we reward having a large number of publications that “tell a story” and fetishise high impact journals. However, the research culture is slowly changing. Journals and funders are beginning to express their commitment to fostering change and reducing barriers. For example, researchers are increasingly required by funders to publish all of their results, and some research institutes even reward scientists with cash prizes for publishing null results.

That being said, anything that a researcher chooses to do with their null results (other than the 3 above options) is a win. The most straightforward option is publishing null results on a preprint server such as PsyArxiv, to ensure that the results are available to the (scientific) community and the work is appropriately recognised.

The perhaps more difficult option is to publish null results as a peer-reviewed paper. While a number of journals have been created with the sole purpose of publishing null results (e.g. Journal of Negative Results in Biomedicine), researchers have understandably been reticent to submit their publications to them. In recent years, an ever increasing number of “mainstream” journals (e.g.BMC Psychology, Nature Human Behaviour, PLoS One) have started to specify that they welcome publications regardless of results, and some even conduct results-blind peer review. There are still concerns about null results having to jump through more hoops prior to acceptance, but at least now, the option is on the table.

Lastly, if neither of these options are appropriate (e.g. a supervisor opposes preprints), a researcher can do their part by ensuring that they always report all the analyses they have conducted (e.g. in the supplement) and/or publish null results as part of a multi-study paper.

Knowing all this — would you trust a journal that publishes mixed results, more or less than one that doesn’t? How about a researcher?

Emptying psychology’s file drawer

Written by Alexandra Lautarescu