The Benefits of Massively-Scaling Platform Research and Accountability

Platform-based social change will continue to be risky and dangerous without progress on evaluation and governance.
Image CC-BY-SA-4.0 by Sygmoral via Wikimedia Commons

Because online platforms observe and intervene in the lives of billions of people, many have come to expect that they should address enduring social problems, including terrorism [1], hate speech [2], suicide prevention [3], police brutality [4], eating disorders [5], and human trafficking [6].

Social interventions often involve complex assumptions about individual psychology, collective behavior, architectures in platform design, and the behavior of algorithms that learn and adapt to human activity. If these factors can be orchestrated wisely, they offer powerful opportunities for social change.

Yet platform-based social change will remain a risky, dangerous endeavor without progress on two fundamental challenges: evaluation and governance. First, without systematic evidence about the outcomes of platform interventions, policymakers risk increasing harms rather than reducing social ills (see “The Obligation To Experiment”). At the scale and breadth of global human society where platforms intervene, estimating policy outcomes would require a thousandfold increase in research over what is currently available publicly. Second, as platforms discover powerful interventions to direct our moral and political behavior, that power must itself be governed.

After testing, many of the most lauded platform interventions have been shown to have damaging side effects or even increased the behavior they were intended to reduce. In 2014, 17 years after the invention of the “downvote” in comment discussions, researchers showed that each vote of disapproval in political discussions causes people to behave more badly over time [7]. Elsewhere, researchers discovered four years after Instagram’s algorithmic efforts to obfuscate self-harm that their policy may have increased participation in harmful communities [8]. After Wikipedia introduced powerful vandalism detection systems, researchers discovered that these systems had caused a dramatic decline in participation over the next six years. In all three cases, the systemic harms from these harm-reduction strategies were not observed until after they had affected tens to hundreds of millions of people for years [9].

Scaling Policy Evaluation Online

Better causal research would allow platform policymakers to predict the likely outcomes of an intervention on average [10]. Because platforms are designed to intervene, monitor, and process data about billions of people, they already possess the potential to make these evaluations [11]. Policymakers can choose from a rich palette of research techniques for piloting interventions, including randomized trials and post hoc quasi-experiments [12]. Where policy goals are difficult to measure, qualitative field experiments allow powerful inferences on the outcomes of an intervention [13].

The breadth and scale of platform power places new demands on the scale of research. Because the nature of problems such as hate speech and suicide varies across regions and cultures, it is likely that the most effective interventions will vary as well. While platforms do possess the ability to tailor interventions to context, this tailoring would require hundreds of new studies per policy. Platforms have developed infrastructures to scale research in sales and marketing, conducting up to hundreds of randomized trials per day — which adds up to tens of thousands of studies every year per platform [14]. Yet mass evaluation of policy on a similar scale has never been attempted. By my rough estimation, less than a dozen field experiments have ever been published on public interest uses of platform policies.

Governing Platform Research

If platforms can genuinely provide effective mechanisms for shaping the behavior of billions of people, we need methods to govern the use of that power. Policy research findings could play an important role in holding platform policymakers accountable. In The Open Society and Its Enemies, Karl Popper imagined the role of social experiments in democratic and authoritarian societies. In closed societies, argued Popper, unaccountable policy evaluators could shape human life in secret and toward ends that the public could not influence. Popper proposed the “open society” as an alternative — a society in which the goals and means of social policy are subject to democratic processes, and public knowledge of research guides the public to reject harmful or ineffective uses of power [15].

Local community policymaking — a model with 40 years of history on the internet — may offer a powerful opportunity to achieve needed research scales along with the public accountability of an open society. Across the social web, community moderators, sysops, group admins, and other local policymakers already do substantial work to support and govern millions of people online [16]. In a series of pilot studies I led recently, the CivilServant project offered communities the ability to conduct evaluations on the effects of their local policies, share findings openly, and replicate one another’s research.

In just a few months, these early studies and associated community deliberations have shaped decisions affecting tens of millions of people and spread policy ideas to over a hundred other communities ( If communities continue to embrace this model, they have the potential to generate thousands of transparent, accountable policy experiments in platform governance each year.

Whether platform operators or local communities create and enact policies governing the lives of billions of connected people, responsible use of platform power requires new models of dramatically-scaled research and democratic participation. Platforms already possess the potential for both, but neither is very common. Until reliable methods to evaluate and govern platform power become commonplace, our efforts to reduce social ills through platforms present large-scale, un-estimated risks to society, even as we search for the benefits.

I wrote this essay for a collection of Perspectives on Harmful Speech Online published yesterday by the Berkman Klein Center for Internet and Society. It builds on the MIT dissertation on Governing Human and Machine Behavior in an Experimenting Society that I defended in May 2017 (thesis) (video).


1 — Natasha Lomas, “Twitter Nixed 635k+ Terrorism Accounts Between Mid-2015 and End of 2016,” TechCrunch, March 21, 2017,

2 — Danielle Keats Citron and Helen L. Norton, “Intermediaries and Hate Speech: Fostering Digital Citizenship for Our Information Age,” Boston University Law Review, 91 (2011): 1435.

3 — Rachel Metz, “Facebook Live’s New Suicide-Prevention Tools Come with Good Intentions but Many Questions,” MIT Technology Review, March 1, 2017,

4 — Caroline O’Donovan, “Nextdoor Rolls Out Product Fix It Hopes Will Stem Racial Profiling,” BuzzFeed, August 24, 2016,

5 — Stevie Chancellor, Jessica Annette Pater, Trustin Clear, Eric Gilbert and Munmun De Choudhury, “#thyghgapp: Instagram Content Moderation and Lexical Variation in Pro-Eating Disorder Communities,” (paper presented at 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing, San Francisco, USA, February 27-March 2, 2016),

6 — Mitali Thakor and danah boyd, “Networked Trafficking: Reflections on Technology and the Anti-Trafficking Movement,” Dialectical Anthropology, 37 no. 2 (2013): 277–290.

7 — Justin Cheng, Cristian Danescu-Niculescu-Mizil and Jure Leskovec, “How Community Feedback Shapes User Behavior,” (paper presented at ICWSM 2014, Ann Arbor, USA, June 2–4, 2014),

8 — Chancellor et al., “#thyghgapp: Instagram Content Moderation and Lexical Variation in Pro-Eating Disorder Communities.”

9 — Aaron Halfaker, R. Stuart Geiger, Jonathan T. Morgan and John Riedl, “The Rise and Decline of an Open Collaboration System: How Wikipedia’s Reaction to Popularity Is Causing Its Decline,” American Behavioral Scientist, 57, no. 5 (2013),

10 — Donald T. Campbell, “Reforms as Experiments,” American Psychologist, 24, no. 4 (1969): 409.

11 — David Lazer, Alex (Sandy) Pentland, Lada Adamic et al., “Life in the Network: The Coming Age of Computational Social Science.” Science, 323, no. 5915 (2009): 721–723,

12 — Joshua Angrist and Jörn-Steffen Pischke, “The Credibility Revolution in Empirical Economics: How Better Research Design is Taking the Con out of Econometrics,” National Bureau of Economic Research, Working Paper №15794 (March 2010),

13 — Elizabeth Levy Paluck, “The Promising Integration of Qualitative Methods and Field Experiments,” The Annals of the American Academy of Political and Social Science, 628, no.1 (2010): 59–71.

14 — Ron Kohavi, Alex Deng, Brian Frasca, Toby Walker, Ya Xu and Nils Pohlmann. “Online Controlled Experiments at Large Scale,” (paper presented at the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, USA, August 11–14, 2013),

15 — Karl Popper, The Open Society and Its Enemies, (Abingdon, Oxon: Routledge, 1945).

16 — J. Nathan Matias, “The Civic Labor of Online Moderators,” (paper presented at the Internet Politics and Policy conference, Oxford, United Kingdom, 2016),