AI&.
Published in

AI&.

If We Want Platforms to Think Beyond Engagement, We Have to Know What We Want Instead

A toy claw machine hangs above a pile of emojis

By Claire Leibowicz (PAI), Connie Moon Sehat (Hacks/Hackers), Adriana Stephan (PAI), Jonathan Stray (UC Berkeley CHAI)

Which values should the algorithms driving online content recommendations promote?

What we view on social media, news aggregators, and online marketplaces is largely dictated by algorithms known as recommender systems. These systems are driven by user behaviors such as clicking, liking, or sharing content — collectively known as user engagement. Relying on user engagement metrics can result in the proliferation of misinformation, abusive or extremist content, and addictive use that can negatively affect both individuals and societies. Surely, other values beyond individual user engagement might be better taken into account to improve online spaces. Alternatives include principles such as safety, well-being, agency, justice, and even emotions such as awe and inspiration.

But how to decide among them? And who gets to decide which values are important?

As part of a broader effort to publish a multidisciplinary paper bridging human values and recommender systems, the Partnership on AI (PAI) convened members of its AI and Media Integrity Program (including both those at Partner organizations and not) working in industry, media/journalism, civil society, and academia to consider this topic. PAI asked meeting participants to complete a survey about what values they believed were most important for recommender system-based platforms to attend to. After receiving 29 responses, PAI facilitators then broke participants into groups to discuss the results.

This exercise was not intended to authoritatively determine which human values are appropriate to promote online; insights from a 29-person survey and conversation are by no means statistically significant. However, participants in the AI and Media Integrity Program bring broad experience in handling thorny ethical questions of AI from around the world and diverse professional perspectives including those of computer scientists, journalists, and human rights defenders. With input from these stakeholders, the opportunities for addressing the challenges of recommender systems through participatory input became a little clearer.

Including more stakeholders in the development of recommender systems (beyond just those creating them) will be a critical step towards developing greater understanding of what values should be promoted online. Even then, the best approach to incorporating these perspectives requires further study, as revealed by this survey exercise.

There has been a lot of recent research regarding human values and ethics in AI. We began developing the values survey through assessment of different values mentioned previously in recommender system design literature. The BBC, IEEE, and Berkman Klein Center have all published useful resources on values and AI that informed our survey, and we also relied on foundational human rights documents. Further, efforts like New Public have helped promote a broader evaluation of how to create better online spaces.

Ultimately, we selected 31 values for inclusion in the survey, including everything from agency to duty to self-expression to labor.

Creating the list was a major challenge: our list of values needed to have a level of specificity and nuance to address overlapping and vague concepts. How, for example, should one differentiate precisely among values such as connection, empathy, belonging, and tolerance? In light of this, we aimed for an intermediate level of specificity — not simply “do good,” but not as specific as precise metrics used by product teams, either. We also leaned heavily on previous literature.

Six types of recommender systems were selected for respondents to consider: social media (e.g., TikTok, Facebook), streaming media (e.g., Spotify, Netflix), news aggregators (e.g., Google News), online shopping (e.g., Amazon’s marketplace), video sharing (e.g., YouTube), and targeted advertising. Respondents were first asked to narrow the scope of their values evaluation by choosing the one type of recommender system they were “most concerned about” from the six options. The survey then asked, “How important is it for the platform you are most concerned about to attend to each of the values?” which respondents answered using a five-point numeric scale. The survey hinted that values might be in tension with one another. “We know that different values might conflict. That’s OK!,” read the instructions.

Given that a single list of 31 values seemed longer than a survey respondent could easily digest, we presented the values in four groups. These groups loosely reflect ongoing cultural, psychological, and sociological research around values and value tensions, or when individual and social values can conflict: the relationship of individuals to society, interdependent or community values, personal development and interests, and moral and ethical social order.

This survey has a number of limitations that were clear from the beginning, beyond the most salient one, the survey’s sample size. Despite the inclusion of interdependent values that are significant in non-Western cultural research, the survey was inevitably shaped by a mostly U.S. perspective. While PAI’s membership includes organizations from non-Western regions, the bulk of survey respondents came from the U.S. Still, the exercise provided broader and more systematic insight into value priorities than our small author group could offer. In addition, PAI gained beneficial perspectives and directions from its own membership about desired research and policy directions.

After participants took the survey, we broke the participants into three groups to discuss their responses and to offer feedback. Two thematic challenges for pinpointing values became clear: how to negotiate value tensions and how to provide adequate context for this negotiation.

The challenge of negotiating values was, for discussants, a question of ethical trade-offs.

How, for example, does one approach the trade-off between giving people the agency to personalize what they see on social media versus promoting diversity? What is the trade-off for someone else’s diminished safety versus liberty when an individual decides what is for their own good? One solution offered was for platforms to operate on the “do no harm” principle by prioritizing major problems and promoting other values only when harm was completely mitigated. Value trade-offs also connected to questions regarding system design. Many participants agreed that there should be transparency around who decides which values are important and what guiding principles platforms use to negotiate them.

We also learned a little more about what kinds of context may be needed to prioritize values.

Participants recognized the same geographical limitations of the survey that we did: Both the list of values and the opinions of the survey respondents cannot be generalized globally. They also asked questions about the scope of the survey’s own aspirations: Are we ranking the most important problems for systems today, or describing the values of a hypothetical system as we’d like it to be? For that matter, are we ranking based on what the ideal version of freedom of expression would look like around the world? Some, noting the ways that the survey alternated between focusing on platform power and user agency, asked if the questions took for granted that platforms actually have the power to affect things that are potentially only within the user’s power to change, such as attending to one’s physical health.

29 responses do not offer statistically significant insights. However, taken as a qualitative supplement to our own considerations, and alongside facilitated conversation about the survey, even these few responses provided interesting avenues to explore for future work.

Value prioritization may be different depending upon the type of recommendation product.

Figure 1. Venn diagram highlighting values rated as very important for two types of recommender systems.

Consistent with viewpoints expressed in group discussion, survey respondents rated values differently depending on the platform product they selected. Only two of the six product types received enough responses for meaningful assessment: news aggregators and social media. On such a small set of data, median scores provided a way to roughly generalize priorities. Several values were rated as very important for both news aggregators and social media recommenders. For both types of recommender systems, accessibility & inclusiveness, accuracy, accountability, privacy, and tolerance & constructive discourse were ranked with the highest median score of 5.0 (Figure 1). In news aggregators, justice & fairness and equality & equity were also ranked with the highest median score. For social media, the values of safety & security, mental health, control, agency & autonomy, and transparency & explainability were also ranked as such. Happiness & well-being and connection numbered as least worthy of attention for news aggregators designing recommender systems, while inspiration and awe were of lesser importance for social media platforms.

Figure 2. Visualization depicting trends in median scores for the 31 values, by sector.

Value prioritization may differ according to where you work.

Participants from different sectors evaluated values’ priority differently (Figure 2). Only the value of accountability appeared across all sectors among the highest prioritizations, perhaps understandable given the self-selecting ethical AI interests of PAI participants. People from industry tended to think that every value was worthy of at least some attention. Civil society respondents rated a larger portion of the 31 values as most important, which makes negotiating trade-offs more difficult. Academic responses lay somewhere in between.

Simple value prioritizations may result in values competing with one another.

The responses revealed values that are potentially in tension with one another. For example, civil society members rated safety and agency highly. When it comes to users who may have an unhealthy relationship to dieting, how might one promote the value of safety (“Users should feel safe using the platform. There should be a low prevalence of harmful user outcomes.”) alongside agency (“The platform should help users achieve their goals.”)? Negotiating tensions like these in a transparent way will require tools outside of surveys, such as multistakeholder efforts to produce technical definitions, taxonomies and datasets, and/or standards.

Surprises within value prioritization may suggest avenues to better understand tensions and trade-offs between individual and community values.

Freedom of expression, even for this U.S.-centric bunch, was not rated as among the highest priorities. (Interestingly, while freedom of expression did not number among the highest priorities, accountability did for participants from all sectors.) This ranking may reflect participant sensitivity to the fact that freedom of expression may not be valued as highly across the world in contrast to other values such as privacy and safety. At the same time, we can ask questions such as why community & belonging is ranked lower for social media, an ostensibly social platform, than for news aggregators. Understanding the answer to this question, and others stemming from value tensions, may help provide a path toward value negotiations and realization in recommender systems.

In the end, we learned a bit more about how to help larger groups of people offer input into values for recommender systems. Equally desirable values for individuals and society may conflict and require ethical trade-offs, which means that the more that can be done to contextualize participants’ understanding and approach, the better. Bringing specificity to the broader conversation by addressing different types of recommender systems individually might help better align values and algorithms, since they have different needs and characteristics. Further, since we identified that different sectors and cultures might bring with them differing opinions on which values to prioritize algorithmically, special attention must be placed on incorporating input from these groups in inclusive and comprehensive ways.

There are a number possibilities for more comprehensive value-definition efforts. This work surveyed AI experts, but we would also want extensive research with recommender system users, both qualitative interviews and large, quantitative surveys (as the BBC has recently done). More ambitiously, it may be possible to involve multiple stakeholders directly in the development of value-sensitive optimization metrics, as the WeBuildAI project has demonstrated.

In the short term, PAI is participating in a multidisciplinary paper describing the opportunities and challenges in aligning human values and recommender systems that will help ground the field. This work will include proposed metrics in place of user engagement for designing recommender systems that promote different value priorities. Through these efforts, participants in the AI and Media Integrity Program can continue learning how to create more participatory design processes for the technologies that affect platform users all around the world.

Acknowledgements: The development of this survey was led by the post’s authors as well as Parisa Assar (Meta) Alon Halevy (Meta), Sara Johansen (Stanford), Lianne Kerlin (BBC) Polina Proutskova (BBC), and Spandana Singh (New America’s Open Technology Institute).

The authors of this Medium post, listed alphabetically at the top, contributed equally to its creation.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Partnership on AI

The Partnership on AI is a global nonprofit organization committed to the responsible development and use of artificial intelligence.