Resisting proposed reforms to research practices in psychology: some Reflections on researchers’ rationales

Sam Parsons
9 min readMar 27, 2018

--

aka. how many r’s can I fit into a title? or, my PhD viva is later and I have needed a break from looking at my own thesis these past few days.

Psychological research has the potential to answer all of the questions we thought of about how the mind works, why people act the way they do, how can we combat emotional disorders, and so on. However, the replication crisis and what may become a measurement/reliability crisis have demonstrated (to those of us that have taken this on board) that we simply cannot trust large swathes of the published literature.

The million dollar question is, how do we improve research practices and prevent further crises?

Many reforms to psychology research practices have been proposed. Some have the potential to revolutionise the way we publish and review research, whereas others amount to including a few extra details in the manuscript.

Washburn and colleagues report data from over a thousand social and personality researchers that offered their perspectives on four proposed reforms to psychology research practices. The percentages of respondents that have engaged in the reform are below;

  • report effect sizes — 99%
  • conduct power analyses — 87%
  • make data publicly available — 56%
  • preregistering studies — 27%

The more impressive feature of the study is that these researchers were also asked whether it was acceptable to not adopt the proposed reforms. First, the percentages of respondents that indicated that not engaging in these practices is acceptable.

  • report effect sizes — 36%
  • conduct power analyses — 60%
  • make data publicly available — 70%
  • preregistering studies — 85%

These recommendations are some of the least radical proposals for improving research practices out there. These are recommendations that are good and valuable based on first principles, rather than some kind of revolutionary aim to overturn the academic world. However, these values suggest that many researchers find it acceptable that researchers do not act on these proposed reforms.

The rationales put forward for not adopting the reforms are illuminating. Some are frustrating because they are so obviously wrong to those of us engaged in improving psychological research. Some point (as the authors also suggest) to the need for more training and support in enacting these reforms. So, as a start I have selected a few responses for each proposed reform that I thought were worthy of discussion. The first response I discuss is the most common rationale for not engaging in that practice.

report effect sizes

Effect sizes are redundant with other reported statistics or should be calculated by readers (e.g., all the information is there for readers to calculate effect sizes)

If all of the information is available for the reader to calculate effect sizes, then purely from a clear writing perspective, it will be an easier for the reader to understand the results if you simply provide the effect sizes for them. Readers want to understand your results, not segway through a bit of maths first.

Perhaps there are some instances in which reporting effect sizes is not helpful (I would genuinely love examples of this). Until I see a solid example, it seems to me that reporting the effect size adds useful information to descriptions of the results.

Effect sizes are not important, are uninformative, or are irrelevant to the field or specific research questions (e.g., psychologists are interested in process, not intervention)

In some analyses I agree that effect sizes are not the focus. Perhaps it is model fit that is the essential statistic of importance. I would, however, like specific examples. However, if you are using standard Null Hypothesis Significance Testing (NHST) and report p values as your main criterion of ‘evidence’, then omitting the effect size renders the p value near meaningless and uninterpretable.

Effect-size reporting is not a standard, norm, or journal requirement (e.g., some journals, reviewers, and editors do not ask for effect sizes)

This was the rationale given from almost 16% of respondents. The journal doesn't require effect sizes. This is an indefensible position from a quality of research standpoint. By extension, you could publish a garbage paper in a fake journal and it is just as valid because there is no formal requirement for actually reporting the results fully.

This response (also repeated for other reforms) does not address any rationale for why in some situations the reform might not be applicable or a rationale against the benefits of the reform. It suggests, clearly and plainly, that the holder of such an opinion values the publishing of papers for other reasons than to actually do science and provide knowledge.

conduct power analyses

There may be no basis for estimating effect size, power analyses are not needed for exploratory research (e.g., effect sizes are not known in the case of new research paradigms)

Let’s acknowledge firstly that most studies are under powered and this can undermine the results and the replicability of results (e.g. for neuroscience, see Button et al.). In the case of exploratory research, there will feasibly be a smallest effect size that researchers will be interested in finding. This would then be the effect size to power the study to (Daniel Lakens has discussed this on several occasions, e.g. here and here). Even when you do not know what the effect size of interest will be, you should want your study to be designed and powered so that you can detect effects in the first place.

Power analyses are not needed when researchers have experience or can guess the appropriate sample size without formal analyses (e.g., in programmatic research, researchers know how big the sample should be)

One of the best rules of thumb that I have heard is that rules of thumb are often wrong. This occurs quite frequently in power calculations. Say you want to test the difference between two conditions with a basic t test and assume a medium effect of d = .5 with 80% power, how many participants should you recruit. Somewhere from the void as I wrote that sentence I could hear “20 per cell”. G*Power returns a total sample of 128, yes 64 participants per cell. twenty participants per cell gives you about 34% power.

Take home message; unless you have experience in running formal power analyses, you probably don’t have the correct experience to not run formal power analyses.

Limited resources or other constraints on increasing sample sizes make power analyses unnecessary (e.g., sample size may be limited by the size of the participant pool)

This is perhaps a counter argument for the overly simplistic approach of increasing sample size (e.g. see Smith and Little). If you can only recruit 20 participants, your power will still be limited for certain analyses, access to participants does not change this fact. What is needed are alternative methods or a different analytic strategy, neither of which are covered by this response.

make data publicly available

Making data publicly available raises issues concerning participants’ confidentiality (e.g., participants’ identity needs to be protected when data are sensitive)

This is completely legitimate in my opinion. This however, only applies to a small set of research and cannot be applied to other fields in which there is no chance that the data could be used to identify participants. e.g. much of cognitive psychology could share anonymised data with no risk of breaking confidentiality.

Researchers will share data upon request (e.g., data do not need to be publicly available to be open to sharing)

Sorry to break it to those that genuinely believe that this is the case, but this is simply wrong. Studies have shown (e.g. Stodden et al.) that the likelihood that researchers will share data upon request is unacceptably low. Why unacceptably? because it is often a requirement of publication that the data will be shared on request. So, even when this is mandated, researchers are not doing so. It is true that data do not have to be publicly available to be shared, but it makes it a hell of a lot more likely that others will actually be able to access it.

This proposed research-practice reform is not needed (e.g., not sharing data is always okay)

This proposed research-practice reform does not actually improve science (e.g., data quality cannot be inferred from data posted online)

I’m going to put these two together, because they both give me the rage. Sharing data allows external checks of the validity of results. Sharing data enables other researchers to test alternative explanations of results. Sharing data makes it easy to spot an error; you may not like this, but the scientific record does. Not sharing data is wasteful, it could save another researcher the effort of collecting an entire sample unnecessarily. Sharing data allows for meta analyses to be conducted using that data set. In short, open data offers many possibilities that will help improve science.

preregistering studies

this is my favourite proposed reform, the responses cover such a wide range of responses to this, and similar, initiatives to improve research practices.

Preregistration is not needed for exploratory research (e.g., it is not needed for pilot studies, descriptive research, or secondary data analysis)

Just how exploratory are we talking here? Is exploration just a case of running an extra analysis to follow up on a potentially interesting finding, or is it p-hacking and HARKing until the significances (p < .05) are found?

Firstly, preregistration is a tool to distinguish between exploratory and hypotheses-testing analyses. Therefore, it is a useful tool when a study contains both hypotheses-generating and hypotheses-testing elements. Most importantly, it allows us to distinguish between the two, an exceedingly useful and transparent addition to the usual closed-door policy of research transparency.

I might concede that if a study is purely exploratory no preregistration should be required. However, psychology has a habit of reporting exploratory studies as if the significant effects observed were hypothesised from the start. This hindsight bias can be eliminated with preregistration as we have a record of which hypotheses and analyses were planned. It sounds amazingly obvious, but you can only test a hypothesis statistically, if you specify that hypothesis before hand.

There is no current requirement for preregistration, it does not increase validity (e.g., there is no incentive for preregistration, it is not common practice)

This old chestnut again. Just because something is not required, does not mean that it would not improve the quality of research.

We do need better incentives to preregister, that are more aligned with robust research practices. Here’s a small one — it makes the research process quicker. In my experience, having a fully structured analysis plan before data collection is completed allowed my data analysis to be quick and straightforward.

Preregistration is not needed if researchers are honest (e.g., preregistration creates an atmosphere of distrust, most researchers are trustworthy)

This assumes that preregistration aims to reduce outright fraud. Most researchers are trustworthy, but we can easily fool ourselves into thinking that an exploratory result was what we hypothesised all along. Then suddenly that result is the key result reported in the paper and the bias begins anew. Preregistration helps prevent researchers from fooling themselves. Current publishing and incentives structures in psychology research do not mirror robust science. In fact, they actually promote these biases in support of novel and sexy findings, regardless of weak evidence. If the current structures support biased reasoning, we should not assume that researchers do not possess these biases.

Researchers need to be critical, we are taught that from day one. Changing this argument to ‘we should all trust each other’, misses the point entirely. I will always trust a preregistered study more, because the statistics the conclusions rest on are specified and it is clear which aspects are exploratory. Otherwise, however trustworthy the researchers on a particular study are, there is still a chance that they have fooled themselves.

There are no resources for preregistration, researchers do not know how to preregister studies (e.g., guidelines are unclear, journals do not provide a way to preregister studies)

OK, so there are lots of these. We do need better training. I am pushing for developing courses for Oxford undergraduates that cover all of these issues and to improve methods and stats teaching more generally.

For anybody that needs to find these resources, here is a quick link dump. These are only a few resources, and a quick google search will find many more. There is an abundance of information on preregistration out there

How Preregistration Helped Improve Our Research: An Interview with Preregistration Challenge Awardees

The preregistration revolution

The promise of pre-registration in psychological research

journals that offer registered reports

pre-registrations in social psych — a discussion and suggested template

Your thoughts?

There are lots of researchers engaged with these reforms. If we can harness some of the energy currently being spent on debating alpha thresholds and Bayes Factors vs p-values, we might be able to improve teaching and training on these issues.

The objections to proposals to improve research practices are genuine ones that need to be addressed in order to facilitate a wider acceptance and adoption of proposed reforms to improve our research practices. It is not enough to say that some of these objections are misguided, but to provide better education on the benefits. It is important that we articulate that the proposed reforms to practice are ultimately beneficial to us all, as the quality of psychological science will improve as a result.

I want to hear others thoughts on this paper and the responses to proposed reforms in psychology research. Do you agree/disagree? What have I missed? What else can we do to give others the same moment of realisation many of us have had, that research practices need to improve?

--

--

Sam Parsons

Postdoc Fellow @lcd_lab_donders | Reliability and psychometrics in cognitive neuroscience | Initiatives: @ReproducibiliT @FORRTproject | Trending to bawbag