Crowdsourcing Impacts

Julia Barnett
Technically Social
Published in
5 min readJul 29, 2022

How can we utilize the crowd to learn more about the impacts of algorithms used in society?

Algorithms are starting to assist in decision making in all realms of society. Whether its seemingly small effects, such as curating your Spotify playlists or Netflix feeds, or much larger implications in federal government decision making or aiding your doctors, these algorithms are becoming more pervasive in our lives everyday. This is not inherently good or bad, but algorithms have ethical weight and it is important to understand how they can impact us when being used in decision making.

One way to brainstorm different algorithmic impacts could be to leverage crowdsourcing. Tapping into a diverse crowd could help advance anticipatory ethics — the idea of guiding technology development in more humane ways — by helping developers think about impacts they might not have otherwise considered.

How could this work?

To explore this idea of crowdsourcing impact, we analyzed over 2,500 societal impact reviews from Amazon Mechanical Turk crowdworkers shown a description of an algorithmic decision making (ADM) Tool and asked to “explain to what degree the algorithm described has the potential to create negative impacts in society” in addition to ranking their answer on a 1–5 scale (5 being extremely positive and 1 being extremely negative impact). We analyze this set of impact evaluations by the crowd and came away with three central areas of insights:

  1. The topics themselves — ​​what did the crowd sourcers uncover?
  2. Cognitive diversity — how does utilizing more crowdsources affect the number of topics discovered per document?
  3. Impacts in relation to each other — which topics are co occurring?

Topics Discovered

There were four main themes within the topics described in the reviews:

  • Valence — the general evaluation of how good or bad the ADM was for society. Either good, bad, or neutral, with a high skew towards neutral and positive evaluations.
  • Societal Domains — various areas and aspects of society that the proposed ADM could affect. In order of prevalence in our dataset, they were: Environment, Healthcare (non-COVID), COVID, Infrastructure, Fraud Prevention, Public Safety, Education, Mental Health, and Children.
  • Impact Typesthe different ways or dimensions through which impacts could manifest. In order of prevalence they were: Efficiency, Financial Costs, Decision Making, Sustainability, Risk Assessment, and Large Scope or Magnitude.
  • Algorithm Concerns — the more abstract algorithmic concerns of bias, harmful results, and privacy.

Almost all documents had topics discussed from each of these four larger categories, provided enough crowdsourced workers were asked.

Cognitive Diversity

The more crowd workers we asked about an algorithmic decision making tool, the more topics we discovered (we asked 5 each). Grouped by the topic type (valence, societal domains, impact types, and algorithm concerns), you can see that as the number of reviews increased, the more topics were uncovered.

It was not a linear relationship between the number of reviewers asked and the number of topics discovered, as you can see valence had the quickest decline to a plateau of 1.5–2 valence topics per document with more reviewers, societal domains and impact types experienced a similar rate of decline both ending at 2.6 topics per document after 5 reviewers, however algorithm concerns increased steadily with each additional reviewer. This impact type may benefit from more crowdsource workers being employed.

Topics in Relation to Each Other

Finally we looked at the topics in relation to each other — how are these topics cooccuring? We define co-occurrence as each time two topics were uncovered by reviewers in the same document. For example, an algorithm designed to determine if certain factories were at elevated risk for inspection scored highly on risk assessment and public safety which were presumably top of mind for the algorithm designer. However, this ADM also scored highly on large scope, mental health, and healthcare (non-COVID) because some of the reviewers brought up that if this causes factories to shut down unnecessarily then workers could be out of jobs and suffer poor mental health and risk losing insurance.

This heat map of topic co-occurrences shows a standardized scale of the co-occurrences with the median mapped to zero and the other bins above and below mapped to evenly spaced quantiles). The topic types with the greatest rate of co-occurrence are impact types (e.g., decision making, efficiency) and algorithm concerns (e.g., privacy and harmful effects).

The second heat map shows the difference in these relative co-occurrence scores between documents with at least one positive or negative valence topic which helps show the relative interaction of topics with respect to the valence dimension. For example, individuals perceived most topics’ interactions with education in a more negative valence than they did with any of the other impact areas, whereas there was a stronger positive valence when they occurred with COVID or healthcare.

Some topics like fraud prevention and financial costs had a notable dispersion across positive and negative valence depending on what topic was also discussed. For instance, when fraud prevention was discussed in tandem with education or large scope, it was more likely the document was discussed with a negative valence. However, when discussed in light of COVID or healthcare, there was a more dominant tendency to appear on documents evaluated with a positive valence. This indicates that the intersection of topics has an important effect on societal evaluation of algorithms.

In sum…

Crowdsourcing can be an effective means to leverage cognitive diversity to anticipate the impact of different algorithmic decision-making tools. We uncovered four main types of topics: valence, societal domains, impact types, and algorithm concerns. The more reviewers you ask, the more topics you will uncover, and there are useful trends when analyzing their cooccurrences.

For future work it would be useful to learn how reframing the question (to what degree does this ADM have a negative societal impact) will affect the answers discussed by crowd sourcers, as well as changing the crowd sourcers themselves: beyond laypeople, maybe mix in experts from tangential fields or policy makers or journalists. And even though we didn’t test it here, we believe that this process would be useful at an early stage of algorithm development far before deployment when there is still time to make substantial changes to the algorithm.

Please check our full paper for more technical details:

J. Barnett and N. Diakopoulos. Crowdsourcing Impacts: Exploring the Utility of Crowds for Anticipating Societal Impacts of Algorithmic Decision Making. Proceedings of the Conference on AI, Ethics, and Society (AIES). August, 2022. [PDF]

--

--