Stop using MTurk for research

7 min readJul 31, 2019

With the support of Y Combinator, we want to change how behavioral research on the internet is done.

The nature of how we conduct psychological and behavioral research is changing profoundly.

In 2005, less than 5% of psychology research studies were conducted online. In 2015, this number has risen to 50%. Similarly, others predict that in coming years, nearly half of all cognitive science articles will involve online samples. [1] Where’s the remaining 50% of research being run? Apparently, the bulk of the remaining psychology research takes place offline, primarily at colleges and universities. [2] Given current trends, a lot of that is likely to move online, too.

It turns out that around 38% of online studies are run via online crowdsourcing platforms, most prominently on Amazon’s Mechanical Turk (MTurk) with about 35% [ibid.]. A quick Google Scholar search for “Mechanical Turk” reveals 1k results in 2010, ~7k results in 2015, and ~11k results in 2018.

Clearly, online data collection is booming, and we need to ask ourselves what this means for science.

The Problem

After spending 5+ years working in the research space, it appears to me that there are at least two classes of problems when it comes to online sampling: Ethical vs. data quality problems. Often, they can be related. Given the popularity of online sampling, I am wondering: how should data be collected online? What premises are we willing to accept, and what is unacceptable?

Let’s start with ethics.

It’s no secret that the median wage for MTurk workers is $2 per hour. It’s also no secret that MTurkers are often treated like a commodity, as if they weren’t real humans: They get randomly kicked out of studies, their concerns aren’t taken seriously, they get unfairly rejected by researchers, and they’re left to fend for themselves when problems arise. I recommend The Atlantic’s piece describing the online hell of Amazon’s Mechanical Turk if you want to get a deeper understanding of how morally dubious the situation is. This would never be allowed in lab studies, and yet when it comes to MTurk, Institutional Review Boards (IRBs) seem to turn a blind eye. 😟 It’s crazy to me that so much of our science relies on MTurk as a source of data.

Then there are issues around data integrity.

Last year, the quality of data collected via MTurk plummeted, all of a sudden. Researchers found that the percentage of poor respondents in MTurk surveys had drastically increased between 2013 and 2018 — in some cases up to 25% of respondents were found to be suspicious or fraudulent.

To top it off, a bot panic followed, leaving the research community wondering: Is it bots, or bot-assisted humans, who are completing our online studies? There are temporary solutions here, by using additional software on top of MTurk, such as TurkPrime, or Positly. However, “bot-gate” raises fundamental questions about whether MTurk can be relied upon as a data source in online research.

There’s also evidence that MTurkers are expert survey takers: They’re often familiar with common experimental paradigms and have learnt how to avoid attention checks. In the research world, this is called “participant nonnaivety” and it’s been found that nonnaive participants can reduce effect sizes. Unfortunately, because MTurk hasn’t been designed with research in mind, mechanisms that might evenly allocate studies across the MTurk worker pool do not exist.

Relying on an unregulated platform that doesn’t invest into “workers” (= participants) and their well-being is bound to backfire eventually. Not just for the “workers” themselves, but also for all the other stakeholders: researchers, decision makers, businesses, investors, society. If the data we base big decisions on is flawed, we’re in trouble. This is not just a niche, narrow problem for the scientific community; it’s a problem that affects all of us sooner than later.

All of the above raises serious questions about MTurk’s suitability for research. Researchers should ask themselves whether they can morally, scientifically, and societally justify their continued use of MTurk.

Since all of the above problems are solvable, surely there must be a better way to do online research with people?

The Solution

Thankfully, multiple new ways of collecting online samples are emerging. Our very own startup Prolific (www.prolific.co) is one such way, and we like to think of it as the scientifically rigorous alternative to Mturk.[3]

For the past 5 years, we’ve been developing Prolific, initially part-time and alongside our PhDs. Prolific lets you launch your survey or online experiment to 70,000+ trusted participants in Europe and North America. We don’t provide experimental software — for that, we recommend ambitious behavioral scientists turn to Gorilla. But what we do make sure is that our platform is ethical and has high data integrity. Prolific is as fast as MTurk, but doesn’t have the drawbacks. Let me explain.

First, we make sure to cultivate an atmosphere of trust and respect on Prolific. We mandate that researchers reward participants with at least $6.50 an hour, because we believe that everyone’s time is valuable. Our platform is designed such that prescreening is decoupled from the actual studies themselves. This means that participants never get kicked out of studies and it minimizes the chances of dishonest responding. Plus, our support team is always on hand, mediating when disputes occur.

Second, we verify and monitor participants so you can be assured that you collect high quality psychological and behavioral data. Please see this blog post if you’d like to know how we do this.

Whatever your target demographics are, you can probably find them via Prolific. For example, you can filter for students vs. professionals, Democrats vs. Republicans, old vs. young people, different ethnicities, people with health problems, Brexit voters, and many more! And… 🥁🥁🥁 …as of last week, you can now collect nationally representative samples at the click of a button!

Our goal is to create a platform and an environment where incentives are aligned, people feel treated like real humans, and disputes are resolved fairly. At the end of the day, this will lead to high quality research and more data-driven, robust decisions in society.

Anyone can sign up for free as a participant on Prolific and start earning a little extra cash. To be clear, Prolific is not intended as the main source of income for anybody. It’s a platform that connects researchers with research participants, on a casual and non-committed basis. Prolific is compliant with the EU’s General Data Protection Regulation (GDPR) and participants can choose to opt out of studies anytime.

The bottom line is this:

On Prolific, you need not worry about bots or sweatshops because we’re building a community of people that trust each other. Prolific is built by researchers for researchers. Data quality, reliability and trust will always be priorities for us.

Here’s what our users think:

OK, so Prolific helps you find research participants on the internet. What’s the big deal? Why is this important, and why should anyone care?

The Bigger Picture

Prolific has seen 150–200% year-on-year organic growth in terms of users and revenue, without doing marketing and sales and without having raised any external capital (with one exception: we were part of Oxford University’s startup incubator program). As first-time founders, it’s been a steep learning curve. 🤯 But we’re still here, and we’re only getting started.

The fundamental problem we’re hoping to help with is better access to quality psychological and behavioral data. We think that any individuals, businesses, and governments would benefit from better access to quality data when making decisions about people.

For example, what could we do to best curb climate change? What’s the best way to change unhealthy habits? How can we reduce hate crime and political polarization? How to live happier lives?

The stakes are high, and behavioral research can help us find better answers to these kinds of questions.

We’re excited to announce that Prolific has joined Y Combinator’s Summer 2019 (S19) batch to work on these fundamental problems! 😃🥳 With the support of Y Combinator, we want to change how behavioral research is done online. To learn about our journey from Oxford to Y Combinator, read this piece.

Our bigger vision for Prolific is to build tech infrastructure that powers behavioral research on the internet.

We have lots of ideas on how one might do this. We want to:

improve longitudinal functionality
improve p2p messaging
launch a survey builder
build tools that enable better data sharing (to reduce research redundancy and waste)
offer integrations with other tools and software
improve and automate data quality checks
improve participant diversity and representativeness
improve incentives for researchers and participants
launch group accounts for large organizations

and so much more. If you have any questions, feedback or ideas for us, don’t hesitate to send them our way!

PS. Here’s a cheeky lil bonus for those who have made it all the way to the end. Get $100 off your first study on Prolific if you spend $250 or more. Just sign up via this referral link. 😀💪

Hope you enjoyed this post.
Much love from San Francisco ♥️

Katia & Team Prolific

References

[1] Stewart, N., Chandler, J., & Paolacci, G. (2017). Crowdsourcing samples in cognitive science. Trends in Cognitive Sciences, 21(10), 736–748.

[2] Anderson, C. A., Allen, J. J., Plante, C., Quigley-McBride, A., Lovett, A., & Rokkum, J. N. (2019). The MTurkification of social and personality psychology. Personality and Social Psychology Bulletin, 45(6), 842–850.

[3] Peer, E., Brandimarte, L., Samat, S., & Acquisti, A. (2017). Beyond the Turk: Alternative platforms for crowdsourcing behavioral research. Journal of Experimental Social Psychology, 70, 153–163.

Originally published at https://blog.prolific.co on July 31, 2019.