Beyond Bias: Contextualizing “Ethical AI” Within the History of Exploitation and Innovation in Medical Research
On December 14, 2019 I gave an invited talk at the “Fair ML for Health” workshop at NeurIPS. Below is a write-up of that talk. You can watch the talk here: https://slideslive.com/38922104/fair-ml-in-healthcare-3?t=3848s
It’s time for us to move beyond “bias” as the anchor point for our efforts to build ethical and fair algorithms.
A couple of weeks ago a world-renowned behavioral economist named Sendhil Mullainathan wrote an op-ed in the New York Times, arguing that algorithms offer us the unique opportunity to both efficiently surface and correct for biases that are deeply ingrained in our society. Mullainathan has built his career on conducting “bias audits” like this blockbuster study from 2003, which revealed significant racial biases in the U.S. job market. The author argued that today’s data-rich world enables us to carry out similar studies much more efficiently than we ever could have dreamed two decades ago. He compared his early studies to his more recent experience auditing a medical triage algorithm for bias, whereby he and his colleagues were able to both identify bias within the algorithm and then significantly reduce that bias by removing a problematic variable from the model. This example, he argued, illustrated the powerful potential of algorithms to help us overcome biases that are woven into the fabric of our society.
I am so sick of this argument, but it will persist as long as we continue to ignore the broader set of issues related to the ways our algorithmic innovations are inextricably linked to relationships of power and exploitation.
Everywhere I look, academic researchers and large tech companies are debating the best strategies for identifying and cleansing their algorithms of bias. One of the most prevailing methods we’ve developed for rendering our algorithms more “fair,” is to embrace diversity and inclusion as the solution to bias — if we can include more people in the data we use to build our algorithmic systems, then those algorithms will be able to serve a more diverse set of people in the future.
Diversity and inclusion seem like particularly important values to uphold in the context of medical algorithms, like the one that Sendhil and his colleagues recently audited. Medical research has historically excluded women and people of color from their clinical trials. Medical professionals exhibit a number of troubling racial and gender biases when administering treatment. Perhaps algorithms could offer us the opportunity to course correct for these biases in order to make medical care more inclusive. In fact, I’m inspired by some of the work that’s being done to use machine learning to challenge widespread misconceptions in medicine — like this recent study by my colleague from MIT, which reveals that there are no real differences in the way men and women experience and exhibit the symptoms of heart attacks.
At the same time, there are a growing number of people who are beginning to push back on the idea that diversity and inclusion are the primary values we should uphold in the pursuit of “ethical AI.” At a recent workshop at Harvard entitled, “Please Don’t Include Us,” scholars argued that, for many people, inclusion in algorithmic systems will ultimately result in the development of systems which are designed to surveil, criminalize and control them.
This perspective is based on a critical insight — one population’s consumer technology is often another population’s carceral technology. Facial recognition is a great case in point. For many people, facial recognition is experienced primarily as a consumer good, one that makes it easier for some of us to open our iPhones without using our hands. For others, facial recognition software is primarily experienced as an extension of law enforcement, used to profile and criminalize them as they go about their daily lives. As Nabil Hassein argues,
“The reality for the foreseeable future is that the people who control and deploy facial recognition technology at any consequential scale will predominantly be our oppressors. Why should we desire our faces to be legible for efficient automated processing by systems of their design?”
Moreover, underlying inclusion in algorithmic systems like facial recognition are often coercive relationships of data extraction. For example, Google recently came under fire for hiring contractors to collect 3D scans of dark-skinned faces by offering $5 gift cards to homeless people of color in Atlanta. Google undertook this effort partly in response to calls for them to build more inclusive data sets. Yet, in the process of trying to ensure that their Pixel would perform well on dark skinned faces, Google revealed a set of deeper issues regarding the ways that inclusion is often rooted in fundamentally extractive relationships between large tech companies and extremely vulnerable populations.
Ethical frameworks which posit diversity and inclusion as the solution to the narrow problem of bias are fundamentally limited, because they neglect important questions re: who benefits and suffers as a result of the innovations undergirding our shiny new technologies. In order to grapple with these issues we must unpack the social conditions which make innovation possible in the first place.
The field of medical research provides a compelling example of the ways we might expand the conversation regarding what it means to build “ethical AI.” As scholar Britt Rusert has argued, medical innovations have routinely benefited from the systematic withdrawal of resources from vulnerable populations. She points to the Tuskegee syphilis experiment as a classic example of the ways medical research has often extracted data from vulnerable communities under the guise of care. Rusert argues that “researchers drew on perceptions of primitiveness and timelessness of black southern life to obscure the exploitations of the experiments” that they carried out. These researchers relied on the racialization of space to authorize the enclosure of experimental populations in “rural laboratories,” where they could be surveilled and controlled for the sake of medical experimentation. In the process, black lives were sacrificed to develop life-saving diagnostic tools and treatments that established the U.S. as a global leader in cutting edge biotech.
How do the failures of Tuskegee relate to the challenges we face today in building “algorithms for social good?”
Let’s look at a couple of quick examples.
There is growing enthusiasm around the potential to use machine learning to analyze vocal biomarkers in order to build more accurate diagnostic tools for a variety of physical and mental illnesses, such as Parkinson’s disease. It’s being touted as the cutting edge of preventative medicine. At the same time, others are repackaging vocal biomarkers as a means of vetting potential terrorists at the U.S. border, arguing that such data can be used to predict whether or not someone poses a “risk” to U.S. national security with up to 97% accuracy. This carceral application of vocal biomarker analysis has been debunked as junk science, but it’s still being peddled to government officials as a means of minimizing the bias of flawed human decision makers reviewing asylum cases, and thus rendering the process more “fair.” Again, one person’s life-saving consumer technology is another’s carceral technology.
Another interesting example comes in the form of a recent announcement that the Gates Foundation made in partnership with the NIH, whereby they plan to invest hundreds of millions of dollars to develop gene-based therapies for sickle cell and HIV. While the initiative has been characterized as a means of bridging the gap in access to these cutting edge treatments, some researchers have expressed concern that this investment doesn’t prioritize getting life-saving treatments to the patients who need it most.
The vast majority of sickle cell and HIV patients reside in sub-Saharan Africa, where well-established treatments are still inaccessible to a large portion of the population. Should we be spending millions of dollars to develop data-intensive treatments that may or may not work down the line, or should we divert that money towards bridging the gap between the haves and the have-nots for the treatments that already exist? As Sarah Hamid pointed out in tweet responding to this announcement,
I bring up these examples so that we can begin to identify the parallels between the ethical failures of our past and the challenges we face today as we develop algorithms to inform high stakes decisions. I hope that by drawing these lines of connection we can begin to move beyond bias as the primary framing of the ethical stakes of our work and begin to grapple with the various ways that our technological innovations both benefit from and exacerbate deep inequalities and injustices in our society.
As the builders of algorithmic systems, it is our responsibility to identify and actively resist harmful applications of our work. We must begin to unpack the ways our data-intensive innovations benefit from the systematic exploitation and criminalization of vulnerable communities. We must focus on building structures of accountability around the algorithms we produce, to ensure that the people who often serve as the bedrock of our algorithmic innovations also benefit from them. We must move the conversation beyond “bias,” if we have any hope for our innovations to make a positive impact on the world.