New Way to Measure Crowdsourcing Bias in Machine Learning
An overview of how to use counterfactual fairness to quantify the social bias of crowd workers
--
Crowdsourcing is widely used in machine learning as an efficient form of annotating datasets. Platforms like Amazon Mechanical Turk allow researchers to collect data or outsource the task of labelling training data from individuals all over the world.
However, crowdsourced datasets often contain significant social biases, such as gender or racial preferences and prejudices. Then, the algorithms trained on these datasets would then produce biased decisions as well.
In this short paper, researchers from Stony Brook University and IBM Research proposed a novel method to quantify bias in crowd workers:
One Line Summary
Integrating counterfactuals into the crowdsourcing process is a new method to measure crowd workers’ bias and help the machine learning pipeline be more fair (in the preprocessing stage).
Motivation and Background
Crowdsourcing is used a ton in ML to label training datasets. However, crowdworkers’ bias could then be embedded into the datasets, making them biased. This causes the algorithms trained on these datasets to be biased too.
Previous research to measure social biases in crowd workers have centered around self-reported surveys or Implicit Association Tests (IATs), but both methods are distinct from the labelling task itself.
Furthermore, both methods may lead to crowd workers being aware that they’re being judged and hence impact social desirability bias — they may answer queries they believe are more socially acceptable rather than choosing what they genuinely believe.
A better method, this paper proposes, is to use the idea of counterfactual fairness.
What is Counterfactual Fairness?
The word “counterfactual” refers to statements or situations that did not happen — “If I had arrived there on time…”, “If I had bought that instead…”.