Reflections on pre-registration: Part I

Kate Nussenbaum
9 min readNov 22, 2017

--

Courtesy of xkcd.com

I just submitted a pre-registration for the first experiment I plan to run as part of my Ph.D. I would post the link to it, but I decided to embargo it for the maximum time allowed — 4 years — or until the resulting paper is published.

As someone who procrastinates endlessly by reading Twitter, I have read a lot about the benefits of pre-registration over the past year or so, which is why I ultimately decided to try it out. But I haven’t read much from the student perspective about what the process is like, so I decided to write up a quick list of reflections that may be useful to others as they consider whether to go down this path.

I’m labeling this blog post “Part I” because I intend to reflect again after data collection to see if my thoughts and feelings have shifted at all.

First, some background, which you may want to skip if you are already familiar with these issues: About 10ish years ago, lots of people started to recognize that many findings in psychology were not replicable. I won’t go into the gory details for the many reasons why this was / is the case, but one problem that plagues psychology (and many other sciences) is the number of “researcher degrees of freedom” that often exist in analysis pipelines. To put it (over)simply, many researchers (including myself) often rely on null-hypothesis significance testing. I’m not going to get in to the pros and cons of this approach (though it’s something I hope to think about throughout my Ph.D.), but the basic idea of it is that you run tests to determine the probability of observing your results if the effect you hypothesized was not real. Conventionally, the approach has been to use a .05 threshold — you can deem your findings “significant” if there is less than a 5% chance of them emerging in the absence of a real effect.

One of the main problems with this approach is that researchers often have vague hypotheses that can be tested in many ways. It’s not very hard to come up with 20 different tests that all get at the same general idea. For example, I might want to see if there is an effect of eating cookies on individual’s ability to remember a list of words. I might run 20 participants in a cookie group and 20 participants in a broccoli control group. I run a t-test and find that there is not a significant difference between the number of words participants across groups remembered. But then I start thinking, wait a minute, three people in the cookie group only ate half a cookie each! That’s barely anything! That’s not what I predicted! Let’s re-run the test without their data. But still, there’s no effect. As I’m thinking about this, I realize my list of words contain two words that are longer and maybe more difficult to learn than the others. That could be obscuring the effect! So I remove those words from the analyses and try again. And then my hypothesized effect emerges!

What this anecdote is meant to illustrate is that the more tests I run on my data, the better my chances of getting a statistically significant result. This means I can essentially p-hack my way to a significant, but meaningless, finding, even if everything I’m doing feels somewhat reasonable at the time. That last part is critical. I completely trust myself not to commit fraud and make up data. But I worry that I might be quite talented at deluding myself into thinking that analyzing my data 100 different ways is reasonable, particularly because there often is not one best way to analyze a data set.

And that is true — there are reasonable justifications for analyzing the same set of data in lots of different ways. But the minute you start looking at your data, it becomes harder and harder to tease apart whether you are justifying an analysis approach because you truly think it’s the best way to look at things or whether you are justifying an analysis approach because you want that intoxicating excitement of discovering something cool.

That’s where pre-registration comes in. You think about the best way to analyze your data before you collect it, and you post your time-stamped plan online so you feel pressure from everyone on the Internet — as well as the well-intentioned, optimistic version of yourself in the past—to stick to it. Writing out your plan ahead of time makes it a lot harder to delude yourself into thinking your p-hacking is a legitimate way to unearth meaningful truth. So with this goal in mind, I spent the past few weeks writing up the pre-registration for my first study.

Here’s what I did and what I learned:

  1. I approached my advisor about the idea of pre-registering my study. I am super lucky because I have an advisor who was very open to the idea of pre-registration and happy to discuss all the logistical and scientific issues that arose, but also not dogmatic about any one approach. She had never pre-registered a study before, so going through the process was (and still is) a learning experience for both of us —in many ways, the fact that neither of us had done it before made us more inclined to carefully consider everything we’re doing and amplified the value of the experience. I don’t have good advice for students whose mentors don’t want to go through the process (sorry). Even though I currently believe pre-registration is the right way to do science, I’m still anxious about it potentially hindering my ability to make a cool discovery, and I can’t imagine having to deal with both that anxiety and the fear of disappointing an advisor who didn’t want to pre-register in the first place. I am very grateful that the lab I am in made it easy and exciting for me to do this, particularly because I didn’t start thinking that much about these issues until after I applied to grad school, so it wasn’t something I asked about when I interviewed.
  2. I used the Preregistration Challenge template from the Open Science Framework, though I did not enter the challenge because I didn’t want to wait the two business days for it to get reviewed and I noticed many target journals I might want to publish in were not on their approved list. It also seemed like the challenge applied to papers published before December 2018, and I didn’t want to feel sad later about not meeting that deadline.
  3. Their template asks you to specify all the measures you will collect and all the analyses you will run. Rather than only thinking about hypotheses, it forces you to think very carefully about how you are going to test them. This makes it easier to identify and remedy any confounds in the design, as well as to consider any other type of information it will be worthwhile to collect or things that you need to make sure are in place before you start collecting data (i.e. IRB submissions).
  4. It’s efficient (mostly). As I started to plan all possible analyses I will want to run, I encountered a couple scenarios for which I needed to learn more about statistics. For example, I plan to analyze my data using mixed-effects models, but in the past, I have had trouble with non-convergence. If I weren’t pre-registering, I would be tempted to simply deal with this issue when it came up later, after I had collected my data. But since I needed to specify the exact procedure I will follow, I had to read up on modeling issues much earlier. Rather than slowing my project down, I think this sped things up, since there are always some logistical issues that delay the beginning of projects but don’t require very much work. For example, before I could start testing, I had to a.) wait for a lab IRB amendment to go through (so I can share my data!) and b.) wait for the touch-screen monitor we ordered to arrive. We only realized that we needed the touch screen after I had designed and programmed my task. That meant that had I not been working on thinking through and programming all my analyses, I would have had one to two weeks of not making that much progress on this project. Of course, when you have several projects going at once, this is less of a concern, but as a first-year Ph.D. student, most of my efforts are currently directed toward this one study.
  5. That said, there are some things involved in pre-registering that slow down projects. For me, this mostly came down to thinking through all the “what if” scenarios. For example, in my study, I plan to test age differences in learning. I am IQ-testing participants, in part to ensure that there are no IQ differences between age groups that could be confounded with my age effects. Although I am designing my recruitment procedures to mitigate potential differences between groups, there is still the possibility that age-related differences in IQ will emerge in my sample. If I were not pre-registering, I would have just crossed my fingers, and figured out how to deal with these IQ differences if they did in fact show up. But since I had to specify all analyses, I had to figure out how to deal with that before I even started collecting data. This required me to think through statistical tests that I hope to not have to run. In my case, this wasn’t a huge deal, but I can imagine some designs where there are a ton of “what if” scenarios that take up a lot of time. In the long run, thinking through these things is probably useful anyway (yay, learning!) but sometimes it can be frustrating if it’s what stands between you and starting to collect data.
  6. You have to find a balance between specificity and feasibility, and this balance isn’t always 100% clear. That’s when having a thoughtful and supportive advisor is incredibly helpful (I mean, it’s always incredibly helpful.) Pre-registering a vague analysis plan is not worthwhile. But I also ran into some problems when I tried to write out my whole analysis script (though I’m glad I did this because now it’s mostly done). With some of my statistical models, I may have to add or remove random slopes and correlations so that they converge. It seemed easier to write out what steps I will follow in order to systematically do this (i.e. I’m still totally restricting my researcher degrees of freedom) than to write out every possible model I might run. That said, attempting to write out the entire analysis script and run it on pilot data that won’t be part of my final sample was incredibly useful. It forced me to ensure everything was being recorded correctly, and it made me set up my whole organizational system so I know where data and scripts will be stored.
  7. It feel as if my process from this point on is much more straight-forward than it has been in the past. We may need to run follow-up studies to clarify the interpretation of some of our findings, but I’m not anticipating agonizing over whether an effect is emerging in the data or not. Of course, I’ll be super sad if I run my study with the specified sample size and see the dreaded p = .06. But at least I won’t be tempted to add more people to my sample and stop when I see my effect emerge —I know my pre-registration is going public in 4 years, and so the shame of doing bad science would be too great. And also, I don’t want to do bad science. But honestly, the public shame is sometimes the greater motivator.
  8. You can pre-register without believing that pre-registration is the panacea for all bad science. There are a lot of reforms being discussed in the scientific community (and by that I mean on Twitter). Using statistical methods that are more robust to experimenter choice, publishing all studies as registered reports, verifying all results in a second, independent data-set, etc. are all things that might be better than pre-registration and render it unnecessary. Sometimes it seems like you have to be 100% in the “pre-registration is the best thing ever and all studies that aren’t pre-registered are garbage” camp or 100% in the “pre-registration won’t solve any problems” camp. I’m not in either camp. It seemed to me like pre-registration would increase the chances that any findings I eventually publish reveal something meaningful about human thought and behavior, and that’s why I want to be in science in the first place — to discover some sort of useful truth about what it means to be human. So I pre-registered. But I’m open to the idea that my opinions might change as I learn more.

--

--

Kate Nussenbaum

Graduate student in developmental cognitive neuroscience; occasional blogger. @katenuss