Responsible AI on-the-ground in 2023. Hopes and provocations for 2024.

Lauren Wilcox
11 min readDec 30, 2023

--

The year started strong. I was able to grow the Technology, AI, Society, & Culture (TASC) team in Google Research with the addition of Renee Shelby, Christina Harrington, and Andrea Parker (visiting in the Atlanta office) and secured intern positions for the team. We began hosting esteemed guests (most in person!) for TASC discussions (e.g., Lauren Klein and later in the year, Mary Gray and Chinasa Okolo). We kicked off research collaborations with academic faculty and TASC welcomed Avery Mack, Akshita Jha, Jaemarie Solyst, and Jay Wang as interns (Jay in collaboration with PAIR).

I was so inspired, all year, by everyone on the team and our collaborators: In addition to shaping processes and making product impact, folks in TASC had papers at top computing conferences, many recognized with awards [1], as well as published datasets [2], influential workshops internally and externally [3], and impactful talks and keynotes [4] and community site visits [5].

Throughout the year, the team balanced RAI-related contributions to Bard, RAI for model evaluations [6], and original research submissions to top-tier computing conferences and venues, leading work at the intersection of AI/ML and global cultures [7].

I contributed to spinning up a company-wide GenAI product inclusion effort, created pathways to connect work on the team to efforts across the company, and shaped the recruitment and study design for a large-scale, international Bard pilot and evaluation (with Shaun Kane contributing expertise in accessibility for recruitment and study design).

It was great to see research from Robin Brewer’s time in TASC (Brewer et al., Gadiraju et al.) featured in her profile in Popular Science. I’m grateful to Merrie Morris for sponsoring the visiting roles that led Robin and Andrea to join TASC, which also enabled us to publish work together, with Fernando Diaz, on ethics of AI in health consultations. Fernando recently started his associate professorship at Carnegie Mellon University, where he is continuing to advance recommender systems (we were so happy he could still spend a percentage of time with us!). These systems — particularly when curating content like music, literature, and film — play an important role in shaping how cultural content is consumed and how creator incentives are defined. In collaboration with Professor Georgina Born at University College London and postdocs Andres Ferraro and Gustavo Ferreira at Mila, Fernando has been studying the relationship between recommender systems and how concepts from public service media (PSM) can be used to align algorithmic curation with normative values. [8].

The rest of this post describes personal reflections based on a selection of my research, external engagements, and my new role building the responsible AI function for a large company.

Winter, Spring, and Summer

Our work to broaden participation in AI research, development, and ongoing evaluation gained traction. Chelsea Wang published work from her internship with me at ACM CHI 2023 that illustrates three emerging responsible AI practices.

Michael Madaio, shown wearing glasses and an oxford shirt, at a podium at ACM CHI 2023, presenting Qiaosi (Chelsea) Wang et al.’s paper on designing responsible AI. The slide shows “Building and Reinforcing an RAI lens” as its title.
Qiaosi (Chelsea) Wang et al.’s ACM CHI 2023 paper on Designing Responsible AI, presented by Michael Madaio.

Among other contributions, this work brings us closer to understanding organizational factors that shape design decisions, and discusses nuances of stakeholder and user involvement in the face of potentially hazardous experiences, based on fieldwork and interviews. It was nice to see it highlighted as an editor’s choice selection.

At ACM FAccT 2023, I got to participate in two sessions focused on end users’ and affected stakeholders’ participation in various phases of AI/ML development, testing, and auditing, and potential risks associated with different models of engagement (with a follow-up workshop on community-collaborative approaches held at ACM CSCW 2023).

FAccT 2023 panel shows participants at a long table with Wesley Deng (CMU), as moderator seated around it, and the panelist names and profile photos shown on the screen. Organizers and participants include Shivani Kapania (Google), Ken Holstein (CMU), Motahhare Eslami (CMU), Lauren Wilcox (Google Research), Su Lin Blodgett (Microsoft Research Montreal), Danaë Metaxa (University of Pennsylvania), Nicholas Diakopoulos (Northwestern U ), Karrie Karahalios (UIUC)
Moderator Wes Deng leading our FAccT 2023 panel, “User Engagement in Algorithm Testing and Auditing: Exploring Opportunities and Tensions between Practitioners and End Users” with panelists Christo Wilson, Shubhanshu Mishra, Karrie Karahalios, Nicholas Diakopoulos, Su Lin Blodgett and Lauren Wilcox (not pictured: Shivani Kapania, Ken Holstein, Motahhare Eslami)
FAccT 2023 panel on Community-Collaborative Approaches to Computing Research, shows panelists Sucheta and Lauren on stage, with fellow panelists and moderators seated around them and on screen.
Our FAccT 2023 panel on “Community-collaborative visions for computing research.” Pictured: Sucheta Ghoshal, Lauren Wilcox. Angela D.R. Smith, Marisol Wong-Villacres, Emily Tseng, Sheena Erete. Not pictured: Calvin A. Liang, Akeiylah DeWitt, and Yasmine Kotturi

I was also honored to be invited to serve, along with Remi Denton, on Partnership on AI’s Global Task Force for Inclusive AI.

We developed and launched a community-based research playbook for internal use at Google (Eric Corbett, Sheena Erete, and Remi Denton in TASC with collaborators across Responsible AI including Jamila Smith-Loud, Courtney Heldreth, and Ned Cooper). I was also delighted to work alongside Technology & Society and University Relations leaders to create a new category for Google’s Award for Inclusion Research (AIR) program, focused on collective and society-centered AI. The hallmark of funded projects in this program is the inclusion of impacted stakeholders throughout.

Of course, I still engage with areas of health AI related to equity, transparency, and safety. I was happy to weigh in on this year’s Pew Research Center panel on the Best and Worst Changes in Digital Life by 2035. Responding to questions about the potential beneficial uses of AI, I discussed opportunities for increased access to health information, certain types of diagnostics, telemedicine (noting objections to certain forms of multimodal data collection), and last-mile infrastructure. But many of the problems dogging the field wouldn’t be solved by these technologies — and they will only improve our collective wellbeing when they support rather than replace human interactions, and are coupled with innovations that create the conditions for health and remove structural barriers to formal and informal care.

Many discussions on safety in the AI R&D community raise issues that the health field has grappled with for decades: from reliance on human-in-the-loop (HITL) methods to monitor algorithmic decision-making, to mechanisms for participatory adverse event reporting, to conceptualizing and communicating risk. These are also set against a backdrop of barriers to representative research, lack of access to resources for large swaths of the global population, and structural forms of discrimination that laid the groundwork long ago for ongoing health inequity and precarity.

I sought to add perspectives on these topics at health AI events throughout the year (I’m happy to share my panel notes or talk slides upon request):

  • My February AAAI workshop keynote on AI for Assistance in At-Home Tasks discussed how care tasks are situated in complex social contexts, how this adds nuance to assessing task performance, and how to think about alignment and policies with respect to existing social and technical ecosystems. I discussed some of our work assessing challenges of informal caregivers, which offers provocations for AI/ML development, and also highlighted ethnography by James Wright on the integration of robots into eldercare.
  • In March, our AAAS panel on Societal Bias in Data, Machine Learning, Health, and Science discussed implicit tensions between (current) values in AI and health, how ML can mitigate or reinforce structural racism and sexism, and the potential for new problems of societal bias that ML systems could create. Dylan Hadfield-Menell asked fantastic questions as moderator, and in response to the question “To what extent is the persistence of harmful societal bias in healthcare applications of AI/ML due to a lack of tools to address the problem?” I mentioned work by Richmond Wong, Michael Madaio, and Nick Merrill on How Toolkits Envision the Work of AI Ethics.
Marzyeh, Irene and Judy grouped together, pointing to the sign for our AAAS panel on Societal Bias in Data, Machine Learning, Health, and Science and smiling
Getting ready for our AAAS panel on Societal Bias in Data, Machine Learning, Health, and Science. Pictured: Marzyeh Ghassemi, Irene Chen, Judy Gichoya. Not pictured: Lauren Wilcox and Dylan Hadfield-Menell.
  • My talk for the Columbia Department of Biomedical Informatics in April gave an overview of a few co-authored studies on AI in healthcare, but focused primarily on work published at CHI 2023 the following month, documenting how trans and nonbinary people in three countries experienced AI technology, how their algorithmic experiences produce specific harms to health and well-being, and strategies to evolve our approaches to including trans and nonbinary people in research and development. This work received a best paper award.
  • In May, I was honored to serve on a panel at the 2023 Symposium on Artificial Intelligence for Learning Health Systems (SAIL) on “tales from the trenches.” We discussed what it means to go from training and evaluating models in R&D environments to real-world settings. I highlighted lessons learned on the importance of understanding the sociocultural contexts of use, and refining protocols for prospective studies. The panel also discussed data management challenges, stressing the need for high-quality, well-documented data and considerations for secure and ethical data use (further reflections below).
Image of SAIL 2023 panel  with Moderator Mark Michalski at the podium, and panelists Nigam Shah, Lauren Wilcox, Karley Yoder and Erin Palm sitting in a row at a long table before the audience.
SAIL 2023 panel with Moderator Mark Michalski and panelists Nigam Shah, Lauren Wilcox, Karley Yoder and Erin Palm.
  • Moving into the summer, it was also an honor to participate in the University of Maryland’s 2023 Summer Roundtable: Addressing Systemic and Structural Racism to Improve Safety, Quality, and Trustworthiness, with co-panelists Andrea Parker, Avriel Epps-Darling, and Katie Shilton. We discussed the ways in which racial and gender bias and disparities manifest in digital health and information technology, key ethical considerations when designing digital health technologies to ensure that they are inclusive and representative of all racial and ethnic groups, and lessons learned from both effective and ineffective attempts to do so.
  • We published our ACM TOCHI Special Issue on Human-Centered AI… in the Wild. Tariq Andersen, Francisco Nunes, Enrico Coiera, Yvonne Rogers, and I introduced the audience to varied ways that ‘human-centered’ might be defined, through the application of different lenses on AI/ML development, deployment, and evaluation.
  • Our paper, AI Consent Futures: A Case Study on Voice Data Collection with Clinicians received a best paper honorable mention and methods recognition at ACM CSCW. The paper challenges the current narratives of AI being almost entirely assistive for health use cases, and highlights the gravity of decisions relating to data collection and the AI implementation process. We looked specifically at the collection of voice data for interacting with and training models for documentation use cases, and the ways in which the act of collecting voice data to enable these technologies brings with it risks related to trust, service eligability, legal implications, informed consent, workflow disruption, privacy, and how we evaluate accuracy.

Moving into the fall, I was honored to give a keynote at the Nordic AI Meet conference, speak on its panel on AI safety, and speak at the University of Copenhagen’s Confronting Data series. My talks highlighted the importance of interdisciplinary and community-based approaches to AI R&D through a range of examples, from applications of general purpose foundation models, to health-specific use cases.

Nordic AI Meet events in Copenhagen, Denmark. Left: Giving keynote, “Toward Participatory Approaches to Responsible AI.” Right: Panel on Safe, Trustworthy & Inclusive AI with panelists (pictured L to R) Lauren Wilcox, Nitin Sawhney, Elisa Barney Smith, Martin Gebster, and Karolina Drobotowicz, moderated by Ajay Vishwanath.

Fall into Winter

August was a big month as I prepared to transition to a new role at another company. Ramping up at eBay, I’ve been fortunate to bring experts together across the company to lay critical groundwork and infrastructure for company-wide AI governance. We solidified our AI principles, refined our governance structure and processes, set and advanced work on technical RAI priorities, and began the work to systematically establish our position on AI risk.

Standing on stage speaking to a large room of generation alpha attendees, who sit at round tables. The title of the slide shown above me is “Responsible AI”, with Girls Leadership Academy Meetup signage around it, including signs to Dream Big, and Take Action

I also had a blast speaking at the Girls Leadership Academy Meetup event. We welcomed Mohammad Tahaei to the team, whose work on privacy and RAI is impacting the field.

Hopes and Provocations for 2024

Reflecting on this year and looking ahead to 2024, I’ve noticed that critical issues in AI are often framed in dichotomous terms, particularly when they concern governance. Suggestions that we must strike a balance between responsibility and AI progress hint at assumptions that might be impeding our progress by suggesting that responsibility is not innovative but instead a counteracting force to creativity and innovation. Take data safety and AI progress: we can develop competencies and methods that better support data privacy, security, and a variety of forms of governance that advance AI experiences, rather than seeing these as being in tension with innovation.

Similarly, when we talk about safety as being in tension with or holding back progress in AI, we could be stuck in ways of thinking that don’t set us up for the current moment. Importantly, more safe and meaningful experiences for more people is progress. What would it mean to think about safety and progress beyond a balancing act between two competing interests? At the Workshop on Sociotechnical AI Safety at Stanford University last month, I got to hear a range of perspectives on this topic, including the ways in which approaches to defining, assessing, and mitigating risk can be ad-hoc and systematic safety methods and innovations that are in early stages.

Broadening our understanding of what societal risks actually mean in the context of AI is also vital. Many safety efforts — and conversations around them — focus on mitigations local to development and initial deployment. But risk is co-produced as AI can both add to and amplify existing forces that erode safety or risky conditions for people — we need to go beyond viewing safety as a property of technology and its development alone.

Our Western imagination of risk, especially in the US, has a particular positionality that ties risk to notions of self-reliance and success in the face of danger. Risk narratives are deeply ingrained in American culture, which views risk as something to be mastered, controlled, and transformed into opportunity. We need to better understand global perspectives and recognize unpredictable and often inequitable experiences of risk. I hope we can shift the discourse from a mindset of ‘mastering’ risk to one that orients toward the complex, emergent nature of these challenges, and how they vary across different societies and communities.

This year was thrilling, meaningful, and also difficult. Layoffs early in the year affected many of my colleagues directly and had reverberating impacts on our community. Transitioning away from a team of people whom I admire and love was hard. I am fortunate to have good friends and family who provide a support network and to be part of an awesome community of colleagues and mentors who inspire me and lift my spirits each day.

References / Check these out!

1. Best Papers and Honorable Mentions: e.g., 1, 2, 3, 4
2. Datasets: e.g., goo.gle/seegull
3. Workshops: e.g., Rida Qadri et al.’s EC3V@CVPR, Mark Díaz et al.’s FAccT session on Crowdsourced Data
4. Talks: e.g., Remi Denton’s talk on AI Safety for Partnership on AI and Northeastern University, Vinod Prabhakharan’s many talks and panels, e.g., at GPAI summit and NeurIPS, Andrew Zaldivar at Arize, Ding Wang’s CMU talk, Cindy Bennett’s a11y NYC and Columbia DBMI talks, Michael Madaio’s AAAI keynote, Rida Qadri’s AIIoT keynote, Sunipa Dev’s Indo ML keynote.
5. Site visits:
e.g., Mark Díaz’s and Renee Shelby’s invitation by Diane Korngiebel and Kenny Smoker, to visit Fort Peck Tribes.
6. Model evaluations: e.g., Vinod Prabakharan, Sunipa Dev, Mark Díaz, and Renee Shelby’s contributions to PaLM and the PaLM 2 technical report
7. AI/ML & Global Cultures: e.g., 1, 2, 3
8. Fernando Diaz and collaborators developed a quantitative metric to measure the degree to which a recommender system promotes
commonality, a PSM principle supporting shared cultural experience and diversity. This work is currently under revision for ACM Transactions on Recommender Systems and has also been presented as part of the Knight Institute Symposium on Algorithmic Amplification and Society.

--

--