Identifying & Understanding Algorithmic Bias
Introduction
With the Digital Revolution well under way, we are interacting with algorithms on a daily basis. These algorithms make everyday decisions for us such as which posts we see as we scroll through social media and which websites we browse when searching for information online. They also help us make significant decisions such as who is eligible for a bank loan, who gets parole, and who will get interviewed for a job at a company.
For both small and significant decisions, algorithms are ever-present and for many, they are a cause for worry. After all, what if these algorithms, which are often kept behind closed doors at the companies who created them, are not treating people or information equitably. This question drives the worry that algorithms may be perpetuating existing divisions in broad areas such as gender, race, and political ideology. However, before considering the implications of algorithms in all of these areas, it is important to pinpoint exactly what “fair”, or unbiased, treatment would look like.
Defining Bias
There are currently two competing definitions of algorithmic bias when it comes to making predictions about people: equality across groups and equality across classifications. The first definition means that an algorithm’s error rate should be equal across different groups (e.g the algorithm should misclassify women just as much as it misclassifies men). The second definition means the algorithm’s prediction should have the same meaning across different groups (e.g. a risk score of 7 should correspond to the same risk level regardless of the individual’s race).
Both of these definitions seem entirely logical and consistent with each other, but it is mathematically impossible for both to be satisfied at once if two different populations (races, genders, etc.) have different base rates (source). In other words, if two groups already have different rates of something happening (paying back a loan, committing a crime, getting hired, etc) because of existing societal divisions, then the algorithm will be unable to satisfy both definitions of equality at once.
In light of this mathematical fact, the most operable and long-term definition of bias with which systems should be tested is the equality across classifications. This approach creates simpler systems which produce outputs with consistent meanings rather than conditional meanings depending on which group the system is making a prediction for.
Case Study: The COMPAS Parole Algorithm
One of the central cases which perfectly encapsulates the issue surrounding algorithmic bias is the COMPAS parole algorithm. Using a series of metrics, none of which is race, COMPAS assigns criminals a risk score which measures their probability of committing a crime if they are given parole. ProPublica found that the algorithm made more errors for African American men by placing more of them in the “high risk” category, thereby denying them parole, than it did for Caucasian men. However, they also reported that for each risk score, an equal proportion of African Americans and Caucasians reoffended. Thus the algorithm was equal across classifications, but not equal across groups (source).
While ProPublica’s concerns are certainly valid, the COMPAS algorithm should not be altered to be equal across groups. If it was, a judge viewing COMPAS’ report would have to interpret it differently based on the race of the individual since now each score would mean something different for different races. This goes against a very basic tenet of equality that groups should be treated equally regardless of race.
However, something still needs to be done to address the fact that the algorithm makes more errors for one group than another. Rather than changing the algorithm, instead we should focus on changing the circumstances which cause the algorithm’s biases in the first place. For example, african americans may have a higher rate of reoffending because predominantly black areas are policed more heavily and certain police officers may have bias against african americans. Giving police officers different training or reorganizing how police are distributed across neighborhoods might be measures which help bring the base rate of reoffending for african americans closer to that of caucasians. This would mitigate the discrepancy in the error rate between the two races.
Case Study: Amazon’s Hiring Algorithm
Another modern example of computerized systems exhibiting bias in making critical decisions concerning human lives is in hiring. A couple of months ago, Reuters released a report revealing that Amazon had attempted to create an automated resume ranking system to help them fill their technical roles but ultimately abandoned it because the system did not perform well with women’s resumes. It actively penalized resumes which included gendered words such as “women’s” and favored resumes including words frequently used by men.
Of course, Amazon was not trying to actively build a system which favored men over women; it happened because in technical roles, the gender gap makes it impossible for the algorithm to assign scores which are both equal across classifications as well as equal across groups. Having recognized this, Amazon acted in the right manner by disbanding the project and using a reduced version of the algorithm to automate basic tasks which do not directly impact the recruiting process.
This case highlights another approach to resolve problems related to using algorithms in human-related decisions: to reduce their functionality until they can be made in a way which the inequality among groups is negligible. In this case, any hiring algorithm for technical positions will not be equal among groups in gender (i.e it will more frequently score qualified women as lower than it does for men) because it is learning from existing biases in hiring practices. By significantly reducing the algorithm’s impact on actual human beings, hiring still adheres to the principles of equality (i.e equality across classification) while reducing the impact of existing divisions in our social structures.
What Does This Mean For Automating Human-Decisions?
Both of the COMPAS and Amazon cases demonstrate that it is possible to maintain the philosophical ideals of equality as our standard for determining whether or not a machine is biased as long as we take measures to mitigate their impact on real humans and work towards mitigating the social factors which lead to the inequality across groups.
As a society, we need to realize that our algorithms are not made pure. They are always a reflection of the choices we have made whether that is the data they are trained on, the operating assumptions that the engineers made, and existing societal norms as a whole. When companies like Amazon or public entities such prisons decide to incorporate algorithms into their decision-making process, they need to do with caution because while their outputs might mean the same thing for all groups, they may nevertheless perpetuate divisions which should be eradicated. Thankfully, the technology is not at a point yet where these decisions can be completely automated; so as the technology grows, society still has time to grow with it.
Also posted at https://mdb.dev/ethics-in-technology/defining-algorithmic-bias/