Bunking, debunking, and discovery

My speaking visit at Union College a couple of months ago was so generative, and gave me lots of ideas for research. I love it when that happens.

One of the many provocative ideas I heard came from a researcher who argued that debunking false findings in social psychology is more valuable right now than generating new findings.

That’s likely to be an unpopular idea in the field. I sympathize with it, though it strikes me as too broad. First, I think it depends on the quality of the research. Second, I think it’s hard to know what’s most valuable (and to whom) at the time we engage in the work. Whether — and to what extent — our work benefits society is best adjudicated decades after the fact. This applies to both debunking and new empirical findings.

That said, I agree that debunking false findings is undervalued. A common attitude in response to replication attempts, for example, is something like “why don’t they just do their own original research instead of tearing down someone else’s?” This attitude is problematic for a number of reasons. One issue is the assumption that “original research” is particularly valuable right now. This assumption is questionable given the generally low quality of present-day social psychology research.

As far as we can tell, most of the findings reported by social psychologists are false. It’s a remarkable situation, to put it mildly. I never anticipated this. A recent attempt to replicate 55 social psychology findings published in top journals failed to replicate 75 percent of them. Other efforts have shown a similar pattern of non-replicability.

By all the evidence we have to date, I think we have to conclude that most social psychology findings are false. That’s what the data is telling us. A failure to replicate is not decisive evidence that a finding is false, but in most cases the replication attempts are higher quality studies than the originals, with larger and often more representative samples. There is no reason to grant greater epistemic standing to an original study over a replication attempt, and there are many reasons to do the opposite.

A recent paper (PDF) by Rohrer, Pashler, and Harris illustrates why we might be overvaluing original research. They attempted to replicate the findings reported by Caruso, Vohs, Baxter, and Waytz (2013). Caruso et al. claimed that momentarily exposing people to background images of money on a computer screen before an experiment made them endorse free market systems and “social inequality” more than people who were not exposed to the money image.

Rohrer, Pashler, and Harris re-ran four of the five studies from the Caruso paper, using much larger samples. They even ran one of the studies three times. They found none of the effects reported by Caruso et al. Participants exposed to the money pictures didn’t endorse free markets any more or less than participants who never saw the money.

In discussing their resounding failure to replicate with the Caruso team, they discovered a shocking fact: the Caruso team had conducted nine studies, not five, but they only reported the five that confirmed their desired hypotheses. In their paper, they failed to disclose the other four studies that didn’t go their way. They also failed to disclose two outcome variables that were meant to test their hypotheses, but which showed no effects. So out of eleven tests, five supported their hypotheses and six did not — they only reported the five. Readers would have no idea that there were six other tests that showed no results. (For details, see the Rohrer et al. paper.)

That kind of malpractice needs to be purged from the field immediately if we want to be taken seriously as a science. We absolutely cannot do that. Failure to disclose null results undermines the validity of the statistical methods we use to report significant results. (For more on how this works, see Daniel Lakens’ work.) If this malpractice is widespread in the field, there would be little reason for anyone to believe social psychology findings, irrespective of any other issues — it would be irrational and mathematically incompetent to believe published findings in that case.

Another factor that could explain why most findings don’t replicate is that we’re using invalid samples — most commonly, college kids from one university in one country (and only those taking Intro to Psychology, adding additional skew.) This was never a justifiable practice, and it’s strange that it’s persisted so long. College kids from a single campus are artificially homogenous on many variables that could interact with the variables researchers are studying. This creates a synthetic low-variability and low-noise context for our analyses, making us more likely to find an “effect” that isn’t real in humans as such. We can’t credibly make claims about human nature from such unrepresentative and skewed samples. This is obvious to everyone outside of social psychology, and it has a serious impact on our credibility as a field, so I hope we can move forward quickly here. Research based entirely on college kids should be valued much less than debunking false findings — in that case, I agree wholeheartedly with the Union professor.

Notably, Study 1 of the Caruso et al. paper was based on 30 college students at the University of Chicago. Thirty. I’m deeply confused why any scientific journal — or any scientist — would be interested in claims based on samples of 30 people, much less 30 college kids. Note that there were two groups in that study, so there were only 15 people per group. I don’t think we’ll be able to explain to any authority why we publish stuff like this. We need to be serious.

In looking at social psychology faculty job postings recently, I was surprised that none of them mentioned replicability, or methodological quality factors. They might mention a particular topic of research that they prize, but say nothing about replicable research. I don’t think recent events have fully sunk in yet— most findings are false. That may be hard to swallow, but reality is what it is and it doesn’t accommodate our wishes. A lot more research is going to fail to replicate. Many of the things we believe, many of the findings we have in textbooks, will turn out to be incorrect. Much of the research that dazzles us right now is going to burn us by 2020. This is what the evidence tells us. And I’d argue that research based on college kids, synthetic or ideological variables, play money, and so forth is most vulnerable. (I’ll explain what I mean by “synthetic” variables in a new paper.) We need to be much less gullible with respect to quick-and-dirty “college kids in front of computers” research.

I think reform is urgently needed because I think there’s a significant risk that the field will be defunded within the next few years. I urge social psychologists to take this risk seriously. In the US, we have a funding monoculture that is largely dependent on a couple of US government agencies. If most of our findings are false, I think policymakers will question why taxpayers should fund our work. There is also strong evidence (PDF) that the field discriminates against non-leftists and conservatives — this alone may prove disastrous for us, especially since the field has taken no action to prevent such discrimination. (Read about my own experience of being mistaken for a conservative here and here.) Moreover, we know that some research is politically biased in ways that undermine the validity of reported findings. This would be another, albeit smaller, source of false findings, added to the factors I discussed above. (See my and my colleagues’ recent paper on the political bias of the field and its impact on research.)

In the context we find ourselves in, I think integrity should be highly valued in its own right. Debunking false findings should be seen as an act of grace and nobility, of scientific seriousness and integrity, not as party-spoiling or encroachment. The bread-and-butter business of any science should be discovering new aspects and facets of reality. Debunking false claims gets us closer to carving nature at its joints. It clears the path, saves us from wasteful detours, brightens the room a bit. Debunking doesn’t give me the same thrill as discovery — whether being a debunker myself or reading someone else’s. There’s nothing like reporting or reading about a new, positive finding about human nature. I keep several issues of top social psychology journals in my car at all times — if there’s any lull in the day, be it lunch, a dentist’s office, or a long traffic light, I read journals and look for cool new research.

The tragedy is that I can’t read new findings the same way anymore. If I’m being credible with myself, I can’t assume they’re true with much confidence, not given current methods and practices. I don’t know of a method that can identify which studies are likely to replicate. I have no idea how many researchers suppress null results, or who they are. I’m much more reluctant to write about some cool new finding in a public sphere like Medium or a blog post — I don’t want to be the guy who tells you something that isn’t true, something that you might use to make life decisions, hiring decisions, relationship decisions, etc. To social psychologists: to get to a place where we can be mostly confident in social psychology findings, I think we need to act fast. In the Navy, we called it clean-up duty or bilge duty. We should understand that we’re in a special context where debunking is ever valuable, and original research should meet basic standards that would satisfy what we already know about sampling and hypothesis testing, and what a Congressional committee would expect of people with PhDs.

José L. Duarte recently earned a PhD in Social Psychology at Arizona State University. You can email him at jose.duarte@asu.edu.