This post describes the guidelines we follow for ethical pricing and managing of crowdsourced work at the Allen Institute for Artificial Intelligence (AI2). We’ve made these guidelines public as a resource for the AI community.
We welcome your feedback on how to improve the guidelines at email@example.com.
Determining rates for crowdsourcing tasks
US crowdworkers ($8.50+ / hour)
It may be helpful to target crowdworkers based in the US for your data annotations, for example when running natural language generation tasks in English where fluency is required. For US workers, an ethical minimum hourly wage for crowdsourcing work is $8.50/hour which represents the average minimum wage across the United States based on population distribution (minimum wages across the United States range from $7.25/hour [federal minimum wage] to $12.00/hour [WA state minimum wage]).
Non-US based crowdworkers (case specific)
Some crowdsourcing tasks are easily performed by a global crowd. To target a global audience for your tasks, exclude workers in high wage countries. As median daily per capita income is significantly lower in developing countries at purchasing power parity (please see the table in the Appendix for a snapshot comparison), the recommended minimum ethical hourly wage is $3/hour to $4/hour.
For example, for a June 2018 crowdsourcing task performed by AI2 that was targeted to global crowdworkers from developing countries (80% of the work was completed by people in Venezuela), we set a median hourly compensation of $4.58/hour and received a 4/5 rating from crowdworkers on pay, indicating a high level of satisfaction. Note: Venezuelan median daily wage per capita is ~$8 compared to ~$60 in the United States.
How to set prices
Once you’ve determined the appropriate price per hour for the worker demographic you’re targeting, follow these steps to determine the per task pay rate:
- Determine the average time per task: complete your own tasks to determine how long the task takes. Additionally, ask a volunteer/colleague to try your tasks to determine how long it takes for a naive participant. You can also run a small test of 50–100 tasks and utilize whatever analytics your crowdsourcing platform provides to determine the median hourly pay rate.
- Calculate how many tasks can be completed per hour to determine the per task pay rate and round up to account for crowdworker learning curve. You can always adjust again in later rounds.
- Advertise in your task description upfront how long each task takes on average to provide transparency to crowdworkers.
- Monitor crowdworker satisfaction and adjust per task compensation if needed.
It is important to not leak any personal information about crowdworkers. If at all possible, do not collect any personal information (this includes worker IDs, names, detailed demographic information, etc.). Make sure not to release any information about crowdworkers into data generated from your tasks.
Note: On Amazon Mechanical Turk (AMT), worker IDs are often linked to an individual’s public Amazon profile and should be considered personal/sensitive information. If it is necessary to maintain IDs for workers, replace worker IDs with your own generated uniquely generated IDs.
It’s important to be upfront and transparent with crowdworkers about what they can expect from your tasks. Some examples include:
- Conditions for being rejected or blocked from the task.
- How long a task might take.
- Things that can vary from task to task, such as wait times, whether you need to pair workers together, or whether or not the task can vary in length.
Additionally, workers will often email you about a task. If your task runs for a short time period (an hour or so), it’s useful to respond to workers’ emails immediately, since they’ll often ask clarifications about the task while doing it. If you anticipate issues while running a task (for example, if it’s the first run with real crowdworkers), then it’s also recommended that someone familiar with the task is available and watching the appropriate email inbox.
In AWS’s Mechanical Turk platform, rejecting a worker's HITs is one of the most (negatively) impactful actions a requester can take, and mass-rejecting HITs is a certain way to establish a bad reputation with workers. Beyond the earnings lost on the rejected work, lowering a worker’s accepted HIT percentage affects their ability to claim and perform future HITs. This option should be reserved for workers who are clearly abusing your task in some way, and not for merely poor performance.
Design best practices
A good annotation interface makes your task easy to understand and perform, ensuring you’re getting the most out of your time and the crowdworkers’ time. Remember these general guidelines:
Intuitive and clear design
Try to stick to standard features/convention of UIs that you’re familiar with since workers are likely to be familiar with them as well. Instructions are always good to have, but most people should have a good sense of what various pieces do without needing to refer to them. To test this, have a colleague unfamiliar with your task try out the interface without instructions and observe ways they might struggle to navigate.
The interface shouldn’t hang, or lag behind the user to the extent that the workflow is disrupted. Response times of ~0.1s feel instantaneous, ~1.0s is enough to break a worker’s train of thought, and after ~5–10s they will likely start doing other things.
Annotation tools don’t need to have a terribly high bar of engineering rigor, but at the very least they should be maintainable by you or your team and allow some flexibility in changing design or functionality without requiring a rewrite.
Follow AI2 on Twitter @allen_ai