Understanding the Demographics of Amazon Mechanical Turk

Researchers use data science methods to measure the population size on online crowdsourcing platforms

Data science research depends, of course, on datasets, but compiling and labeling high quality datasets often requires painstaking work and human expertise. Facilitating data collection and curation at scale is the basis of Amazon Mechanical Turk (Amazon MTurk), which was launched for public use initially in 2005. Registered participants (also known as workers) are hired globally as independent contractors, to complete micro-tasks such as tagging, data verification, transcribing, and rating, among other tasks. Since its advent, Amazon Mechanical Turk’s user base (both task requesters and workers) has grown tremendously. Amazon bills their platform as a “Human intelligence through an API,” that provides “Access [to] a global, on-demand, 24x7 workforce.” But who are those workers and how many are they? In order to learn more about them, Panos Ipeirotis, Professor of Information, Operations and Management Sciences, and Data Science, Djellel Difallah, Moore-Sloan Data Science Fellow, and Elena Filatova of the City University of New York, developed a generic methodology that is applicable to other platforms “in understanding the dynamics and demographics of the underlying user population” in their recent publication.

Difallah et al. ran a survey for 3 years and collected over 85K responses from more than 40K unique workers. The researchers used statistical techniques, such as capture-recapture, usually developed in the field of ecology, to carry their crowd-population size analysis. Their results and datasets, available here and here respectively, are significant because they highlight the importance of accounting for the population demographics and the propensity of individual workers to participate in a given type of work, and how failing to do so introduces bias due to a lack of demographic variety. Additionally, since there have been conflicting claims regarding the size of the worker population, data scientists may be hesitant to use the platform without verification that there are indeed sufficient unique workers and demographic representation. Their findings indicate that upwards of 100K workers were available during the time period 2015–2018, and they noted that “[the propensity of] participation of the workers in the platform follows a heavy-tailed distribution, and at any given time there are more than 2K active workers.” Additional findings demonstrate that the workforce has a high rate of turnover, but that the new arrivals balance the population of departures.

Amazon Mechanical Turk itself has not provided much analysis of their workers, beyond that researchers can “Access more than 500,000 Workers from 190 countries” as of August 2017. Distinct from previous work in demographic studies on Amazon MTurk, Difallah et al. account for the propensity of workers to complete a given task, and then use this model to infer the hidden selection biases. Specific findings include that the gender breakdown, though roughly equal in percentage, differs based on country of origin. Additionally, Amazon MTurk workers are typically younger than the general population. Workers seem to be roughly divided in terms of marital status as well. Household incomes were measured at below average.

By Sabrina de Silva