Studying Up: Reorienting the field of algorithmic fairness around issues of power
On January 28, 2020, I presented a paper at the ACM FAccT* conference. Below is the write-up of that presentation. You can find the paper here: https://dl.acm.org/doi/abs/10.1145/3351095.3372859
Einstein once said that if he had an hour to solve a problem and his life depended on the solution, he would spend the first 55 minutes determining the proper question to ask.
As academics, so much of our power lies in how we frame the problems we aim to solve, in formulating the right question.
Yet, the academic community dedicated to the pursuit of “fair” algorithmic systems has not taken enough time to develop the right set of questions in pursuit of this goal. In spite of our best efforts, data scientists still lack the methodological and conceptual tools necessary to grapple with key epistemological and normative aspects of their work. As a result, data scientists tend to uncritically inherit dominant modes of seeing and understanding the world when conceiving of their projects. In doing so, they reproduce ideas which normalize social hierarchies and legitimize violence against marginalized groups.
In our paper we challenge data scientists to move beyond these default modes of operating in favor of “studying up.”
In this talk, I’ll:
- Introduce the concept of “studying up” by providing a brief history of the idea as it was first introduced in the field of anthropology in the late 1960’s.
- Then I’ll draw parallels between this conversation in anthropology and contemporary debates we are having in the FAccT* community today.
- Finally, I’ll illustrate what we might gain from “studying up” as data scientists by walking through a case study from my own work building algorithmic interventions in the criminal legal system. Through this case study, I hope to identify some of the key challenges data scientists face when studying up, in the hopes that we can expand the set of issues we grapple with in the pursuit of fair algorithmic systems.
The call to “study up” in anthropology
The idea of “studying up” was first introduced by an anthropologist named Laura Nader in the late 1960’s, during a time when her discipline was grappling with a number of high-profile controversies, which called into question the post-war model of the social sciences as an objective and value-neutral enterprise.
Nader argued that anthropologists had, for too long, harbored a myopic predilection for studying “the downtrodden and the marginalized,” people who were conceptualized as members of isolated cultures in distant lands — whose material reality was disconnected from larger structural forces of domination and power.
Nader pushed her fellow anthropologists to reframe their traditional subjects of inquiry in terms of relationships that extend “beyond the ghetto,” to include powerful institutions and cultures.
Nader’s use of the term “ghetto” was intentional — a ghetto is defined as an isolated place, segregated from larger social structures. The call to study up was a call to contextualize traditional field sites in terms of their relationships to broader institutions and cultures of power, rather than as isolated social alcoves.
For example, in order to understand poverty, Nader suggested we might study banks and colonial administrations, or networks of white collar crime, all of which create the preconditions necessary for specific marginalized and peripheral subcultures to emerge in the first place.
By “studying up” Nader meant that anthropologists should expand their fields of inquiry to include the ‘study of the colonizers rather than the colonized, the culture of power rather than the culture of the powerless, the culture of affluence rather than the culture of poverty.”
“Studying up” in Data Science
There are a number of parallels we can draw between the challenges that anthropologists were facing in the mid-twentieth century and the issues we face today in the pursuit of ethical algorithmic systems.
Like the anthropologists of the ‘60’s and 70’s, data scientists today have been confronted by a series of high profile controversies that illustrate the ways that their work is intertwined with larger political and social struggles. These controversies have given rise to a community of researchers under the rubric of “fair, accountable, and transparent algorithms.”
Within this community, the default tendency is for data scientists to cast their gaze “downward,” to focus on the relatively poor and powerless factions of society. This tendency is particularly widespread amongst projects which self-identify as AI or ML “for social good.”
Data scientists tend to study disadvantage in a one-dimensional way, divorced from the broader social conditions which produce complementary systems of advantage and privilege.
This downward orientation holds widespread appeal, because it creates discursive ghettos around marginalized populations via statistical discourse in ways that disconnect their plights from structural forms of oppression.
It’s what happens when we create crime hotspot maps that focus only on “street crimes” while completely ignoring white collar crime.
It’s what happens when we use arrest data to predict an individual’s criminal propensities, rather than to measure systemic bias in policing practices.
The tendency to frame outcomes in individual terms, disconnected from larger structural forces, is due to the fact that data science is often based on extremely asymmetrical power relations, where those with power and privilege are often the only ones with the resources necessary to collect and analyze data. But as D’Ignazio and Klein have argued, these gatekeepers “have overwhelming incentives to ignore the problem, precisely because addressing it poses a threat to their dominance.”
In this way, today’s data scientists have much in common with the anthropologists of the 60’s and 70’s, who struggled to contextualize their field sites in terms of larger structural forces through which power and domination were exercised to maintain the status quo.
Nader’s mandate to “study up” was a call for her colleagues to deal directly with issues of power and domination in their work. It’s time for a similar provocation to be made within the field of data science.
Data science projects which reorient their work to “study up” could lay the foundation for more robust forms of accountability and deeper understandings of the structural factors that produce undesirable social outcomes via algorithmic systems.
As Nader argued, “If one’s pivot point is around those who have responsibility by virtue of being delegated power, then the questions change.”
Re-orienting the field of data science to “study up” requires us to ask different questions.
But studying up is not easy. It requires us to develop a critical reflex when presented with opportunities to build models based on data collected by powerful institutions.
Case study: Bail Reform
Pretrial risk assessment has become one of the prototypical examples we use to grapple with the ethical stakes of contemporary algorithmic systems. But the ethical debate regarding these tools has uncritically accepted the premise that the best way to address mass pretrial incarceration is by modeling and forecasting the risk of some of the most marginalized and disempowered factions of American society — people facing prosecution and the threat of pretrial detention.
This framing prevails even though it’s judges who are ultimately responsible for making the bail decisions that lead to mass pretrial incarceration.Current efforts to render pretrial risk assessments more “fair” elide more fundamental issues about who to study, what to predict, and whose behavior to influence.
Rather than engage in the fairness debate on its current terms, we sought to redefine the problem space by reorienting our work “up.” We engaged with a group of community organizers working at the forefront of bail reform and began to develop a new vision for how data science could support a better approach. Through those conversations, it became clear that missing from current discussions is a sense of judges’ agency and accountability.
Rather than focus on predicting the likelihood of “pretrial failure” for a defendant, we would try to understand why American judges send so many people to jail, in spite of state and federal laws protecting against excessive pretrial detention. “Studying up” in the area of bail reform would require us to surface insights regarding past, present and future trends in the way judges make bail decisions.
It was not immediately apparent what the most effective approach might be for accessing data to support our work. We were unsure about which organizations to approach for collaboration, and it was unclear how we might go about acquiring data that would shed light on judges’ behavior.
We ultimately decided to approach this problem from three different angles:
- Insider approach — we negotiated with the government of a mid-sized state to gain access to data which would help us analyze and intervene on trends in individual judge’s decisions over time
- Outsider approach — we supported grassroots efforts to collect data that the state was not willing to share in order to build public accountability campaigns around bail reform
- Speculative approach — we designed a speculative risk assessment tool that “flipped the script” by subjecting those in power to the statistical methods which are typically reserved for only the poor and marginalized, in order to render as intuitive the various ways that the methods and discourse surrounding risk assessment are limited and stigmatizing.
One of the biggest challenges we faced was regarding access to data about our research subjects. We grappled with a number of fascinating ethical questions as we negotiated access to state data and forged collaborations with outside organizations in order to generate data that we couldn’t access otherwise.
Negotiating insider access required us to thoughtfully navigate requests to conduct studies that we ultimately declined to carry out because we thought they had the potential to do more harm than good. Even after we navigated those tricky negotiations, our insider access proved precarious. After several months of negotiation, our access was revoked when a high level official deemed our work to be too politically risky.
Our Outsider approach also proved difficult. Data collection is very labor intensive and the quality of the data we were able to collect via grassroots efforts wasn’t great.
Given these challenges, we decided to pursue a more speculative practice that would help to denaturalize the assumptions undergirding pretrial risk assessment, by making judges the focus of predictive modeling. Rather than predict the behavior of pretrial defendants, we used data we gathered from our prior efforts to develop a judicial risk assessment tool which predicts the behavior of judges.
Our model generates a “failure to adhere” score based on a prediction of whether a judge will fail to adhere to the U.S. Constitution by imposing unaffordable bail without due process of law. The algorithm exceeds the accuracy and ROC scores of pretrial risk assessments currently in use. Mainstream risk assessments achieve accuracy and ROC scores of around 65%. Our risk assessment achieves 80% accuracy and a 79% ROC score when generating our alternative “FTA” prediction — of a judge’s likelihood of “failing to adhere.”
Like existing risk assessments, demographic information drives the model. Just as age is the most predictive variable in most pretrial risk assessments, age is the most predictive variable in determining the likelihood of a judge illegally incarcerating someone pretrial.
Judges might balk at the notion that we could predict their future behavior by using demographic information such as their age or marital status. They would find this especially problematic if we frame that behavior as being criminal — failing to adhere to the constitution is a serious offense.
But this algorithm is not intended for practical use. Indeed, many of the technical and ethical problems with defendant risk assessments also apply to our judicial risk assessment. But as a thought exercise and a counter-narrative, a risk assessment that “looks up” can help us to see these technologies in a new light. As Ruha Benjamin argues, “by deliberately and inventively upsetting the status quo in this manner, analysts can better understand and expose the many forms of that discrimination embedded in and enabled by technology.”
Our goal in developing a judicial risk assessment was to render as intuitive the various ways that the methods and discourse surrounding risk assessment are limited and stigmatizing, by subjecting those in power to the very socio-technical processes which are typically reserved for only the poor and marginalized.
The field of algorithmic fairness needs to study up. Research in the field will be limited and distorted if it uncritically accepts the data and problem framings offered by powerful institutions.
But studying up will not be easy. By turning the algorithmic lens on the powerful, researchers will face new ethical grey zones related to access, trust, and permission. A deeply inquisitive study of algorithmic ethics will require creativity and resourcefulness. To generate insights into algorithmic equity and fairness, we must look beyond the existing data and their accompanying interpretations, to imagine the data for what it might be when we cast the algorithmic lens on the powerful.