The Future of Forecasting — Highlights from the BITSS Workshop on Forecasting Social Science Research Results

This post, written by Nicholas Otis, a second-year PhD student in Health Economics at UC Berkeley, is cross-posted to the BITSS Blog.

Researchers are increasingly collecting forecasts of social science research results. For example, researchers have recently integrated predictions into studies examining questions like what are the long-term effects of a community-driven development intervention in Sierra Leone, or how replicable are the results of an experiment, and how influential are experimental design decisions?

Collecting forecasts before you know results can be useful for several reasons. It can help researchers avoid hindsight bias, clarify how much new information an experiment provides, and motivate the publication of null results that challenge expert predictions. Forecasting may also help in selecting interventions when information is limited on what treatments are likely to work in a particular context.

Following the 2018 BITSS Annual Meeting, a group of social science researchers convened to review a growing body of empirical evidence on forecasts of social science results, and to discuss developing an online platform to streamline forecast data collection across the social sciences. Below is a brief overview of the research presented. Slides from each presentation can also be found on the OSF here.

Forecasting experimental replication and stability

  • Colin Camerer discussed work on the ongoing replication crisis in the social sciences. Through the use of prediction markets and simple prediction polls, Camerer et al.’s findings suggest that the research community can predict which studies are likely to replicate.
  • Devin Pope presented a project with Stefano DellaVigna that examined forecasts of the stability of experimental results across a number of experimental design decisions (e.g., changing the behavioral task or varying the culture, geography, or demographics. Results from their sample of behavioral science and replication experts suggest that the correlation between predicted and observed experimental stability is weak.

Forecasting in psychology and political science

  • Don Moore presented findings from the Good Judgement Project, a large, multiyear, government-sponsored geopolitical-event forecasting tournament that has served as a basis for much of the psychological forecasting literature. His work highlighted tradeoffs in prediction accuracy between prediction polls and prediction markets.
  • Gareth Nellis presented policymakers attending a conference with a random subset of results from the first round of the MetaKeta project to examine how they update their beliefs as a function of results from a stand-alone study or meta-analysis. Exposure to a meta-analysis of several studies led policymakers to update their beliefs regarding truth for the remaining studies.

Forecasting in development economics

  • Kate Casey and colleagues collected forecasts of the long-run impacts of a community-driven development intervention in Sierra Leone from OECD and local Sierra Leonean academics, policymakers, and students. In general, Sierra Leoneans overestimated experimental effects while OECD academics underestimated their impact.
  • Eva Vivalt and Aidan Coville collected predictions of experimental results from policymakers at a number of international development workshops, examining how researchers update their beliefs after being exposed to new information. They find evidence of variance neglect — the idea that people neglect uncertainty in experimental estimates when updating, and that people update more on results that are better than their priors.

The workshop concluded with a discussion of a centralized forecast collection platform. Much like the AEA’s social science registry, the platform could allow investigators to post time-stamped project summaries and streamline the collection of forecasts while protecting forecaster anonymity.

A centralized platform would preempt emerging concerns resulting from the growing number of researchers independently collecting forecasts. For example, if many researchers independently contact high-profile researchers, potential respondents’ willingness to complete any forecasts could quickly end. A centralized platform could ensure that researchers interested in making predictions were not contacted by too many researchers. A platform could also allow researchers to build systematic evidence on the conditions under which people make more accurate forecasts, with the eventual goal of minimizing survey error and improving forecast accuracy.

Over the next couple of years, Stefano DellaVigna, Eva Vivalt, and I, in collaboration with BITSS and others, will begin developing and piloting such a platform for use by the wider research community. If you’re interested in helping pilot the platform or have any questions, feel free to reach out to Eva Vivalt (eva.vivalt@gmail.com).