GDI Deep Read: Algorithmic Fairness

Published in

Good Data Initiative

15 min readJan 13, 2021

Welcome to 2021! Over the holidays, our team at GDI has been reflecting on the increasing impact of algorithms on everyday decision processes, especially given current world events. Most media coverage discussing algorithmic biases has focused on social and cultural factors — yet technical biases also play a critical, yet largely invisible role. In this third GDI Deep Read, Research Analyst Anna Sappington reflects on her own experiences with machine learning to explore the nuanced concept of ‘algorithmic fairness’ as a way to prevent both socially and technically embedded bias within real-world tools.

In 2019, self-proclaimed “techno-sociologist” Zeynep Tufekci stood on stage at one of the largest machine learning conferences, sponsored by the likes of Facebook, Google, and Amazon, and in front of hundreds of tech employees stated:

“I probably won’t be invited back for saying this… but organize, insist on having a voice, and refuse to build unethical technology.”

Tufecki was directly asking tech employees to become scientists for responsibility, using their collective voice and freedom of speech to refuse to build unethical algorithms and instead build alternatives.

I had skipped out on a week of school during my senior year at MIT to attend this conference. Sitting in the fifth row, hearing Tufecki’s call-to-action felt like lightning striking a metal rod somewhere inside me. I vowed then and there to adhere to an algorithmic Hippocratic Oath to do no harm and advocate for fair and ethical technology.

At first, putting this oath to practice felt straightforward. When helping teach and develop the curriculum for Inspirit AI’s Introduction to AI camps, we ensured ethics and AI for social good became a central tenet of every lesson. It became more complicated, however, when I was later faced with the exact scenario Tufecki alluded to in her talk. A class in my master’s program assigned a problem set that utilized the popular and racist Boston Housing Dataset.

While I took Tufecki’s advice and refused to build a model that would perpetuate systemic racism, navigating speaking up to a large university department and trying to prove the assignment would result in biased models made me reflect on two questions: What is algorithmic fairness, and what can we do to fix it?

What is algorithmic fairness?

Why is algorithmic bias important?

These days, algorithms are as ubiquitous as infrastructure. And just as bias can be baked into physical infrastructure by either deliberate maliciousness or thoughtless omission, it enters algorithmic infrastructure in these same ways. Dr. Hannah Fry, a senior lecturer at University College London and author of Hello World: How to Be Human in the Age of the Machine, gives the example of racist bridges to explain how everyday infrastructure can be biased. On the route to Long Island’s Jones Beach, 1920’s urban planner Robert Moses constructed a series of extraordinarily low bridges to filter people on and off the highway. The reasoning for this strange design was that the bridges would preserve the ability of white, wealthy Americans who could afford private cars to travel to Jones Beach, but not those who rely on tall buses for transportation.

In the modern age, algorithms facilitate everything from our social media feeds to search engine results, satellite navigation to entertainment recommendation systems; they are as much a part of our modern infrastructure as bridges, buildings, and factories. They’re even inside our hospitals, courtrooms, and cars, all the while garnering the hidden power to slowly and subtly change our thoughts and actions. From criminal recidivism algorithms falsely predicting Black defendants as more likely to commit future crimes to biased medical algorithms favoring white people for health care programs, biased algorithmic infrastructure can erect barriers as formidable as the racist Jones Beach bridges.

Where does algorithmic bias come from?

A prerequisite to attempting to define algorithmic fairness is an understanding of the possible entry points for bias within the algorithm development process. Recent literature has identified several root causes of such unfairness¹:

Existing bias within datasets used for training a machine learning algorithm, such as historically biased human decisions, biased device measurements, or erroneous reports. Machine learning algorithms will ‘learn’ to replicate such biases.
Missing data, such as missing values or sample/selection biases, which result in datasets that are not representative of the target population.
Algorithm optimization objectives that aim to minimize overall prediction error, therefore benefitting majority subgroups over minority subgroups.
Proxy attributes for sensitive attributes within training data. Sensitive attributes differentiate privileged and underprivileged groups (e.g. race, gender, sexuality, age) and are typically not legitimate for use in decision-making. Proxy attributes are non-sensitive attributes that can be exploited to derive sensitive attributes (e.g. SAT scores, zip code, health care costs); using these proxy attributes, an algorithm can learn to make decisions based on sensitive attributes.

Despite a growing awareness that algorithms can be as flawed as their human designers, humans have a tendency to overtrust that which they do not understand. In the case of algorithms, this is exemplified in our blind trust in the rank order of Google search results being indeed the most relevant and trustworthy sources to answer our query.

Paradoxically, once we know an algorithm can make mistakes (e.g. Citymapper estimating a much longer travel time than expected), we tend to overreact and dismiss the algorithm altogether, reverting to our own flawed judgement. This tendency to view algorithms in black-and-white — either omnipotent masters or as useless junk — presents a challenge when working with technology. As Hannah Fry instructs in her book, we need to both acknowledge our own flaws and question our gut reactions while simultaneously examining algorithms more carefully and not placing them on a pedestal.

How do we define “fairness” and what are the implications of our choice of definition?

When it comes to examining more closely the biases of a particular algorithm, we must first define a way to measure algorithmic fairness. It is important to note even this initial step is nuanced and complex. For one, any single-number metric will necessarily fail to capture the full story of the data it is summarizing. Reporting the average of a column of numbers, for example, is inherently a simplification of that data. It is therefore crucial we pick metrics that tell the right story. Additionally, the No Free Lunch Theorem by mathematicians David Wolpert and William G. Macready demonstrates there is an inherent tradeoff in optimizing for one metric over another.

In the legal domain, discrimination can assume one of two definitions:

Disparate treatment: intentionally treating an individual differently based on sensitive attributes such as race, religion, gender, or sexuality (direct discrimination); or
Disparate impact: negatively affecting those with a particular sensitive attribute more than others, even if by a seemingly neutral policy (indirect discrimination)

Notably, algorithms trained on data that do not include sensitive attributes (like race or gender) may not produce the direct discrimination of disparate treatment, but they may induce disparate impact (i.e., indirect discrimination).

Although not fully comprehensive, a recent review of current research on algorithmic fairness highlights five metrics as the most popular approaches to measuring algorithmic fairness:

Option 1: Disparate impact.

A measure of the legal notion of disparate impact (i.e., indirect discrimination). Mathematically, this metric ensures the proportion of positive prediction rates across subgroups of the data are similar by requiring a high ratio between the positive prediction rates of subgroups. In other words, this metric equalizes the positive outcome rates between sensitive attribute subgroups².

The intuition behind this is best illustrated through an example. A positive prediction could represent being hired for a job; this metric would require the proportion of accepted applicants to be similar across a subgroup such as applicant gender. Accepting 48% of female applicants and 52% of male applicants would result in a ratio of 0.923, which is higher than 0.250, the ratio if instead 20% of female applicants and 80% of male applicants were hired.

Option 2: Demographic parity.

Also known as statistical parity, this metric is similar to disparate impact but takes into account the difference in positive prediction rates across subgroups rather than the ratio. Once again, this metric equalizes the positive outcome rates between sensitive attribute subgroups.

Here, extending the job hiring example from above, a lower value in demographic parity indicates more similar job hiring rates between a sensitive attribute subgroup (e.g. applicant gender) and therefore more fairness. Accepting 48% of female applicants and 52% of male applicants results in a difference of 4%, which is lower than 60%, the difference if instead 20% of female applicants and 80% of male applicants were hired.

A disadvantage of these two metrics is that an algorithm that makes perfectly accurate predictions may be labeled unfair if the underlying rates of positive labels are significantly different between two subgroups (e.g. men and women have significantly different breast cancer rates due to biology). A third metric, that of equalized odds, was designed to overcome this disadvantage.

Option 3: Equalized odds.

This metric computes the difference in the false positive rates (FPR) and true positive rates (TPR) between two subgroups and requires the absolute difference between subgroups’ FPRs and TPRs to be upper bounded by some small constant. In other words, this metric ensures that for a sensitive attribute subgroup, both the difference in the rate of incorrect positive predictions and the difference in the rate of correct positive predictions never is larger than some small value we choose.

For instance, in an algorithm predicting whether or not a patient has breast cancer, a false positive would be an instance of the algorithm predicting a patient has breast cancer when they are healthy and a true positive would be a prediction of breast cancer when they do have breast cancer. Equalized odds would ensure that the difference in the FPRs and TPRs for male and female patients is small.

Equalized odds effectively demonstrated bias in the COMPAS algorithm, used by the U.S. criminal justice system for predicting recidivism. Although the accuracy of the algorithm was similar for both African Americans and Caucasians, the odds were different. COMPAS incorrectly predicted future criminality (FPR) in African Americans at twice the rate of white Americans. It also significantly underestimated the future criminality of white Americans (FNR). In sum, predictions by COMPAS failed differently for Black defendants.

Note that with equalized odds, a perfectly accurate prediction algorithm will necessarily satisfy the equalized odds constraints. However, as with any metric that relies on the ground truth data, there is an inherent assumption that the ground truth data are representative and unbiased, which may or may not hold.

Option 4: Equal opportunity.

This metric is similar to equalized odds, but only considers the difference in TPRs, requiring they be similar across sensitive attribute subgroups. Here, we are only equalizing the rates of correct positive predictions (e.g. predictions of breast cancer in a patient who truly does have breast cancer) between subgroups (e.g. males and females).

A drawback of equal opportunity is that in equalizing only one type of error (TPR), disparity in other forms of error will increase. It can also suffer when base rates differ between subgroups, as with disparate impact and demographic parity.

Thus far, each metric discussed evaluates fairness in terms of groups. When evaluating the fairness of an algorithm, one might also consider fairness on an individual level with a metric such as individual fairness.

Option 5: Individual fairness.

This metric ensures similar individuals will be treated similarly. Specifically, if two individuals are similar, it requires the difference between the rates of predicting a specific label for those individuals, given their sensitive and nonsensitive attributes, to be less than some small value we choose.

Choosing which metric for which to optimize is made more complicated by the fact that it is not mathematically possible to satisfy several notions of fairness simultaneously. Additionally, there is an inherent tradeoff between accuracy, how correct predictions are, and fairness; optimizing for a higher degree of fairness we often must compromise accuracy. Building fair algorithms may mean pursuing models that are the most fair with the least reduction in accuracy or otherwise switching our notion of utility altogether.

How do we fix it?

Where can we intervene to make an algorithm more fair?

Now that we have an idea as to what algorithmic fairness might mean and an understanding that even defining “fairness” is nuanced and complicated, how do we begin to fix an algorithm we’ve identified as biased? Just as there were multiple entry points for algorithmic bias, there are three major categories of opportunities for intervention.

Pre-process. Pre-process bias correction involves altering the training data before it is used to train the algorithm. For instance, the ground truth labels of the data could be modified or reweighted to make the dataset more balanced. Alternatively, one could modify the features of the data such that the distributions for different sensitive attributes subgroups are more similar. This in turn would make it harder for the algorithm to differentiate between historically privileged and underprivileged subgroups. Additionally, adding tuning parameters to trade off between optimizing for fairness and accuracy can be employed to strive towards fair representation learning.
In-process. In-process bias correction involves modifying the way the algorithm is trained so as to encourage it to learn a more fair solution. In-process interventions include adding a regularization term to the objective function that penalizes mutual information between sensitive attributes and the classifier predictions. In other words, changing the “goal” of the algorithm such that the best solution is one where predictions are not primarily based on sensitive attributes. Additionally, a tuning parameter could be added to the objective function to balance optimizing for fairness and accuracy when looking for the best solution, rather than just accuracy alone. Other in-process modifications involve adding various constraints that the final model must satisfy (e.g. equalized odds, disparate impact).
Post-process. Post-process bias correction involves processing the output of the model so as to make its decisions more fair. Examples of post-process bias correction include changing some decisions to promote equalized odds, setting different decision boundary thresholds for different subgroups so as to maximize accuracy and minimize demographic parity, or building separate, decoupled algorithms.

What does intervention look like in practice?

Since these three categories of interventions might be made more intuitive through an example, I’ve picked the Natural Language Processing (NLP) application of creating fair word embeddings to illustrate different measures that could be taken to address their gender bias.

The origin story of biases in word embeddings is a fascinating and metaphorically-resonant one that I highly recommend reading more about. The abridged backstory is this: For a computer algorithm to understand language, words need to be converted into numbers. By using vast amounts of text as training data (e.g. Wikipedia), a machine learning algorithm can learn to create numerical vectors for each word based on the context in which it tends to appear (these vectors are the word embeddings!). What’s special about the embeddings is that because they are built using the context of words, they capture semantic meaning within their numbers (e.g. “woman” + “king” — “man” = “queen”, where the words in quotes are numeric word embeddings).

The bias within these learned embeddings stems from the fact that language and culture are intimately intertwined, which is of course reflected in the training data. Embeddings learn to reflect the gendered biases of society such that “woman” + “doctor” — “man” = “nurse,” for example. Or “man” — “woman” = “computer programmer” — “homemaker,” implying programmers are learned to be fundamentally male and homemakers female.

Here are examples of ways to correct for gendered bias in word embeddings according to the three categories above:

Pre-process. One could imagine changing the data used to learn word embeddings (i.e. the Wikipedia text corpus) to remove some of its gender biases. For instance, replacing gendered pronouns with a neutral one or duplicating every instance of a gendered noun with one of the opposite gender. Another approach is to remove articles from the data that are most influencing the existence of gendered bias in the output.
In-process. Gender-Neutral Global Vectors (GN-GloVe) trains the word embeddings with a modified “goal” that forces the sensitive attribute (e.g., gender) to be represented in a particular dimension of the embedding, which then can be ignored later on to debias the embeddings.
Post-process. A post-process approach called hard debiasing also makes use of the idea of a gender dimension. This dimension is determined by a set of words indicative of gender (e.g., “he”, “she”). It also defines neutral words (e.g., occupations). It then negates the projection of the neutral words with respect to the gender direction, which effectively makes the gender bias of the neutral words zero, by re-embedding the words.

NLP is just one of many subfields in which algorithmic fairness has been explored. For further information on approaches to fair visual descriptions, adversarial learning, recommender systems, causal learning, and more, I recommend starting with this review.

Which method should we choose?

Deciding which method to use is heavily dependent on the situation. The data type, availability of sensitive attribute information, model type, desired definition of fairness, and societal context surrounding the task, among other factors, mean there is no standardized procedure for combating bias. Each intervention has its advantages and challenges.

For instance, pre-process methods can be used with any algorithm because they focus solely on the dataset used for training the algorithm. Modifying the data has its drawbacks, however; it can impact the explainability of the results and increase uncertainty in the results. Post-process methods can also be used with any algorithm, though literature reports them as being less effective due to their being applied so late in the pipeline. With post-process methods, biases such as disparate impact may be easier to fully remove, though this is not always the desired measure.

In particular, it could be viewed as discriminatory by some for deliberately damaging the accuracy of some individuals in order to compensate for others (this challenge ties into similar debates around affirmative action processes). Post-process methods can be contentious due to their requiring a decision-maker and the end of the algorithm. This is in contrast to in-process methods, which instead explicitly impose the required fairness/accuracy tradeoff in the algorithm’s objective. However, this can be more challenging and harder to access than the other two categories of methods.

Conclusion

If you walk away having learned only two things from this essay, let it be these: algorithmic fairness is a complex, nuanced topic and it is vitally important we take action to address biases when designing, training, and implementing algorithms that will impact human lives. Here, I’ve tried to give an intuitive overview of where we stand in terms of technical approaches to addressing algorithmic fairness. However, this is only one piece of the puzzle.

For one, we need legal and policy-based regulations incentivizing industry and academia to evaluate the fairness of their algorithms and make changes where needed. In addition, education around computer science — or any technical subject, for that matter — should incorporate ethics as a central tenet in their instruction so as to teach ethical intuition from the start. We also must do better in diversifying the voices in this space, especially in leadership, and listening to those voices. It cannot and should not be white heterosexual men leading the field of algorithmic fairness (nor the field as a whole).

And finally, it is important to realize these biases are not specific to algorithms; they reflect the biases of society, be it through the data collected or the algorithm’s creators. Advocating and taking action for an equal and inclusive world, no matter the domain, will have a ripple effect we desperately need.

Key takeaways (TL;DR):

Algorithms are infrastructure; just like infrastructure, they can be biased (intentionally or unintentionally) and cause harm.
We need to address this bias so as not to perpetuate biases into the new digital infrastructure we’re building.
“Fairness” as a concept is nuanced and complicated.
There are promising technical approaches to debiasing data and algorithms, though challenges still remain.
These technical solutions are only one piece of the puzzle; actively working towards societal equity and justice in all domains is required.

About the Author: Anna Sappington, MSc

Anna Sappington is a Marshall Scholar and MPhil student studying Genomic Medicine at the University of Cambridge. Her current research focuses topically on the intersections between computational biology, machine learning, and genomics. Anna previously interned at the National Institutes of Health, was named a 2019–2019 Goldwater Scholar, and served as an AI Instructor for Inspirit AI. Prior to her MPhil, she received a BS in Computer Science and Molecular Biology from MIT and an MSc in Machine Learning at University College London. Anna has previously published several peer-reviewed articles (including a cell type atlas for the retna), and is a current research analyst at GDI.

[1] Sources from this literature include papers by Chouldechova and Roth (2018), Martínez-Plumed et al. (2019), and Pessach and Shmueli (2020)

[2] Note that the positive prediction rate for the ‘underprivileged’ subgroup is the numerator of this ratio; thus the higher the ratio, the higher the positive prediction rate for the ‘underprivileged’ subgroup. Disparate impact is related to the “80% rule” in disparate impact law, which, “requires that the acceptance rate for any race, sex, or ethnic group be at least 80% of the rate for the group with the highest rate.”