It’s February 2018, and a small group of Harvard researchers are standing on a stage in New Orleans unveiling a system that automates the detection of so-called gang crimes.

A wave of concern ripples through the audience: Data on criminal activity is notoriously unreliable and often subject to manipulation and misclassification. In one notorious case, a California state database of “gang” members was found to contain at least 42 babies.

The Harvard researchers didn’t appear to have questioned the integrity of their source data and hadn’t thought through the unintended consequences of implementing a system like this. They hadn’t acknowledged the problem of racial profiling or of damaging innocent people who are wrongly identified. In fact, they hadn’t considered any ethical obligations at all.

When asked how the tool would be used, one of the project’s computer scientists confessed he didn’t know.

“I’m just an engineer,” he said.

This kind of response is a cop-out, and the audience knew it.

If you have the temerity to insert your work into a political issue that, by and large, doesn’t immediately affect your life, you should also be prepared to accept the consequences — or, at the very least, answer a few hard questions.

To the credit of advocates, journalists, and academic researchers, issues of fairness and bias in the algorithmic and data-driven systems that govern our lives have recently taken on an increasing sense of urgency. This heightened awareness allowed people to challenge this work immediately, whether in the room or online.

It’s a conversation that is essential for constructively scrutinizing technology and the role we want it to play in our world.

But the problem here isn’t only one of biased datasets or unfair algorithms and of unintended consequences. It’s also indicative of a more persistent problem of researchers actively reproducing ideas that damage vulnerable communities and reinforce current injustices.

Even if the Harvard team’s proposed system for identifying gang violence is never implemented, hasn’t a kind of damage already been done? Wasn’t their project an act of cultural violence in itself?

Is “Data Violence” Really a Common Problem?

In 2015, a black developer in New York discovered that Google’s algorithmic photo recognition software had tagged pictures of him and his friends as gorillas.

The same year, Facebook auto-suspended Native Americans for using their real names, and in 2016, facial recognition was found to struggle to read black faces.

Software in airport body scanners has flagged transgender bodies as threats for years. In 2017, Google Translate took gender-neutral pronouns in Turkish and converted them to gendered pronouns in English — with startlingly biased results.

“Violence” might seem like a dramatic way to talk about these accidents of engineering and the processes of gathering data and using algorithms to interpret it. Yet just like physical violence in the real world, this kind of “data violence” (a term inspired by Dean Spade’s concept of administrative violence) occurs as the result of choices that implicitly and explicitly lead to harmful or even fatal outcomes.

Those choices are built on assumptions and prejudices about people, intimately weaving them into processes and results that reinforce biases and, worse, make them seem natural or given.

Take the experience of being a woman and having to constantly push back against rigid stereotypes and aggressive objectification.

Writer and novelist Kate Zambreno describes these biases as “ghosts,” a violent haunting of our true reality. “A return to these old roles that we play, that we didn’t even originate. All the ghosts of the past. Ghosts that aren’t even our ghosts.”

Structural bias is reinforced by the stereotypes fed to us in novels, films, and a pervasive cultural narrative that shapes the lives of real women every day, Zambreno describes. This extends to data and automated systems that now mediate our lives as well. Our viewing and shopping habits, our health and fitness tracking, our financial information all conspire to create a “data double” of ourselves, produced about us by third parties and standing in for us on data-driven systems and platforms.

These fabrications don’t emerge de novo, disconnected from history or social context. Rather, they often pick up and unwittingly spit out a tangled mess of historical conditions and current realities.

Search engines are a prime example of how data and algorithms can conspire to amplify racist and sexist biases. The academic Safiya Umoja Noble threw these messy entanglements into sharp relief in her book Algorithms of Oppression. Google Search, she explains, has a history of offering up pages of porn for women from particular racial or ethnic groups, and especially black women. Google have also served up ads for criminal background checks alongside search results for African American–sounding names, as former Federal Trade Commission CTO Latanya Sweeney discovered.

“These search engine results for women whose identities are already maligned in the media, such as Black women and girls, only further debase and erode efforts for social, political, and economic recognition and justice,” Noble says.

These kinds of cultural harms go well beyond search results. Sociologist Rena Bivens has shown how the gender categories employed by platforms like Facebook can inflict symbolic violences against transgender and nonbinary users in ways that may never be made obvious to users.

This is yet another cultural aggression, enabled by the language we use, the media we create, and the products and services we build.

But more than simply reflecting problematic social attitudes, these systems reinforce and amplify them. As with Zambreno’s heroines, users have to actively struggle against the society’s ghosts — against longstanding and harmful conceptions of individual and group identities that haunt our data and are reproduced by and through algorithms.

Are Engineers Asking the Right Questions?

Discussion of all these cases quickly veers toward fairness, accountability, and transparency. A kind of forensic instinct kicks in: How, technically, did this happen? How, technically, can we prevent it from happening again?

Many of those involved are engineers, researchers, or computer scientists who view these as primarily technical issues, and therefore focus on technical solutions such as algorithm auditing. But if we continually ignore the social dimensions of a problem in favor of technical solutions, we risk trapping ourselves in a continual game of technical whack-a-mole.

If you have the temerity to insert your work into a political issue that…doesn’t immediately affect your life, you should also be prepared to accept the consequences — or, at the very least, answer a few hard questions.

“It’s almost never possible to evaluate the utility of an algorithm by looking at the code or measuring it against a mathematical formula,” computational social scientist J. Nathan Matias points out. “To evaluate the risks or benefits of an algorithm, we need to study its impact in people’s lives, whether in controlled lab conditions or in the wider world.”

A lot of the discussion of how to deal with data and discrimination has focused on the legal framework of U.S. anti-discrimination law.

In many cases, this framing is useful. We know that despite promises to the contrary, the increasing use of big data and algorithmic systems by government and corporations is worsening inequality and decreasing social and economic opportunity for the most vulnerable people in society. Cathy O’Neil’s bestselling 2016 book, Weapons of Math Destruction, describes how biases in algorithms used in hiring or evaluation can cost people jobs, and how opaque and possibly illegal criteria may lead an automated risk-assessment system to unfairly deny some people access to bank loans.

Yet U.S. law has significant shortcomings when it comes to discrimination, because it tends to focus on individual “bad actors” that act in discriminatory ways while failing to account for the structural conditions that maintain inequalities. The law is also only equipped to handle discriminatory acts after they’ve occurred, rather than intervening in proactive or progressive ways.

And in the real world, discrimination is more often the result, not the cause of stubborn stereotypes and pernicious social biases that permeate everything, underwriting our senses of self and others. Lost opportunities — that job or that bank loan — are the end result of larger social and cultural processes that U.S. anti-discrimination law isn’t really designed to address or prevent.

This Is What Institutional Prejudice Looks Like in the Digital Age

At the turn of the millennium, the U.S. Food and Drug Administration deployed an automated system to detect fraud in the Supplemental Nutrition Assistance Program, a federal aid program that provides food-buying assistance to people on low incomes. In 2002, that algorithm inadvertently banned multiple Somali markets from accepting benefits.

The system apparently had not been designed to culturally accommodate the food-buying patterns of this particular immigrant community. It flagged consecutive bulk purchases as fraudulent, rather than acknowledging the local practice of negotiating bulk orders of things like rice and meat in advance, and then sharing rides to the store with other families to pick up the prepaid orders.

One of the shop owners, struggling to keep his market open as a result of this seemingly faceless decision, remarked at the time, “People keep coming in here and saying, ‘What should I do now?’”

Beyond the obvious harm of some Somali communities being denied access to their local food markets, the ban also reinforced harmful cultural stereotypes about immigrants, the welfare system, and fraud. It did so whether the system or its designers meant to or not.

The term “cultural violence” was first developed by the veteran sociologist Johan Galtung. “Cultural violence makes direct and structural violence look, even feel, right — at least not wrong,” he wrote in 1990.

Neither distributional nor representative forms of harm can survive without a cultural backdrop that enables them. Pernicious racist or ethnocentric ideas — that entire countries or regions are “shitholes,” for example — perpetuate violence by justifying extant inequalities, supporting destructive policy or rationalizing physical harm.

And that is the crime we commit when, as researchers and engineers and data scientists, we fail to think not only about the consequences of our work, but also our assumptions, our categories, and our position relative to the subjects of the data we work with.

Reinforcing Anti-Immigrant Rhetoric

Less than a month before that Harvard computer scientist awkwardly defended his work on an automated gang crime system, President Donald Trump delivered his first State of the Union address.

During the address, Trump fixated, as he had done many times during his campaign, on “gang crimes” committed by MS-13, a youth gang that operates in the United States and Central America. He asked Congress to “close the deadly loopholes that have allowed MS-13 and other criminal gangs to break into our country.”

But while MS-13 is a significant problem in El Salvador, Honduras, and Guatemala, the group’s U.S. presence is small, and it isn’t clear that its U.S.-based members are primarily immigrants.

Trump has repeatedly used the group to incite fear among his audience, exploiting a small number of gruesome crimes to drum up support for his sweeping anti-immigration policies. These policies have already produced chaotic travel bans, led to increased raids by Immigration and Customs Enforcement (ICE) officers, and left DREAMers in legislative limbo.

By comparison, the 2020 U.S. decennial census seems far less sensational and an unlikely target for anti-immigrant sentiment.

A vital tool for government and democracy, the census determines how seats in the House of Representatives are distributed and how hundreds of billions of dollars in federal funds will be allocated to key public services.

Yet the census has always been a lightning rod for political interests, wrote the distinguished history professor Margo Anderson in her 1988 book, The American Census. “Issues of race and region, growth and decline, equity and justice, have been fought out in census politics over the centuries.”

The 2020 census is no exception. In late 2017, Jeff Sessions’ Justice Department wrote to the Census Bureau pushing for an additional question regarding citizenship, claiming it wants to aid in the enforcement of the Voting Rights Act. Numerous stakeholders, from civil rights groups to state attorneys general, have decried the request, rightfully pointing out that a citizenship question is likely to depress response rates among Latinx and other groups — especially given the current administration’s rhetoric and policies.

Despite the pushback — and despite the fact that adding a question to a survey instrument without thoroughly studying its possible impact is just plain bad science — Commerce Secretary Wilbur Ross overruled Census Bureau officials in late March and confirmed that the citizenship question will be added. As a result, stakeholders fear the 2020 census will have fewer responses and therefore less representation from an already politically vulnerable group.

And this was the backdrop against which the Harvard researchers chose to present their work: fearmongering rhetoric, anti-immigrant policy, and increased targeting of marginalized communities by immigration and law enforcement. Through their choice of data, and framing the question in the way they did, they inadvertently lent support to the kind of harmful, anti-immigrant representations already damaging Latinx communities across the country.

Those choices are built on assumptions and prejudices about people, intimately weaving them into processes and results that reinforce biases and, worse, make them seem natural or given.

They might not actively endorse the current administration’s policies personally, but they’ve still made those ideas feel, as Galtung put it, “not wrong.”

How We Can Fix This

Where should we start repairing these systems and the culture that produces them?

As scholar Virginia Eubanks, author of the book Automating Inequality, captures it, “If there is to be an alternative, we must build it purposefully, brick by brick and byte by byte.”

Some researchers and technologists are understandably calling for more holistic thinking about the kinds of systems and tools that are being built. The academic Kate Crawford, who specializes in the social impact of data systems, asks audiences, “What kind of world do we want to live in?” Princeton computer scientist Arvind Narayanan argues, “It’s not enough to ask if code executes correctly. We also need to ask if it makes society better or worse.”

Neither distributional nor representative forms of harm can survive without a cultural backdrop that enables them.

The sentiment is a good one — we should certainly work to make the world better for lots of people and not just a few. But this kind of call to action risks hollowness if it doesn’t focus on those people being hurt by discriminatory systems. As author Mandy Henk noted, our subjects aren’t some amorphous “they.” Instead, our discussions need to be grounded in the political and cultural contexts that make these people vulnerable or marginalized in the first place.

Others have suggested that both the collection of data and the design of data-driven systems must become more transparent, which will help ensure that everything is fair and representative. There are certainly cases where datasheets for AI or more robust peer review might be a good idea, but imagining ethical paths forward for research, big data, and algorithms means going beyond technocratic solutions. We have to work much harder to exercise patience, empathy, and humility in how we conceive of the lives and experiences of our data subjects.

To begin, engineers and data scientists are far more likely to create balanced and representational products if they themselves are from a diverse range of backgrounds. “If we don’t have diversity in our set of researchers, we are not going to address problems that are faced by the majority of people in the world,” said Microsoft research scientist Timnit Gebru. “When problems don’t affect us, we don’t think they’re that important, and we might not even know what these problems are because we’re not interacting with the people who are experiencing them.”

Engaging with humanities and social science projects can also help. To that end, it is welcome news that the Social Science Research Council recently announced that it’s partnering with Facebook to give independent researchers access to the company’s data. It’s a potentially huge step toward gaining a better understanding of how data-intensive social platforms like Facebook shape our world — and the lives of people in it.

Yet engaging other kinds of research isn’t enough. Social scientists have had their own share of high-profile controversies, from failing to secure the privacy of research subjects to experimenting with people’s emotions without their consent. But many others have succeeded in meaningfully focusing on the lives and experiences of their subjects, from indigenous and border communities’ use of fitness trackers and the impact of data collection on marginalized groups in major U.S. cities to online privacy and intimate partner abuse. Our engagement, then, must be well-considered and put the needs and vulnerabilities of specific groups first.

It is important to ask who might be damaged by a certain set of assumptions and a certain algorithmic solution. We should also ask: Who does this protect? Who cannot be a target of this set of data? I am shielding someone — what does that say about my system?

Most important, engineers and data scientists need to listen to and engage with affected communities.

They should understand the struggles and histories of vulnerable communities and be ready to challenge their own assumptions.

They should offer support and resources — not opinions.

They should support legislation, causes, and organizations that improve lives without making them increasingly dependent on data-intensive systems of tracking and control.

They should not draw on the lives and experiences of their subjects without contributing something in return.

Ultimately, we need empathy and thoughtfulness in the design of algorithms and data science if we are to change the damaging cultural narratives that reinforce injustice and inequality for vulnerable people.

Because it’s never “just” an algorithm. And there’s no such thing as “just” an engineer.