Why Do Machine Learning Algorithms Exhibit Bias?

Machine learning algorithms use data in order to make analytical models that predict certain outcomes based on the previous data. Some applications of machine learning include algorithms that decide recidivism or hireability. Ideally these algorithms would get rid of human biases such as gender bias or race bias but this is unfortunately not the case. Machine learning algorithms are affected by many things including whether or not the data is correct and up-to-date, whether or not the selection of data is fair and equal, and whether or not the data reflects historical bias or current bias.

Machine learning algorithms that use incorrect or out-dated data have potential to exhibit bias. The White House report Big Data: A Report on Algorithmic Systems, Opportunity, and Civil Rights explains how this could happen with an example of an algorithm that calculates that fastest route between two points:

In the “fastest route” example, this [bias] could occur if, for instance, the algorithmic system does not update bus or train schedules regularly. Even if the system works perfectly in other respects, the resulting directions could again discourage use of public transportation and disadvantage those who have no viable alternatives, such as many lower-income commuters and residents. (White House Report p. 8)

The topic of incorrect or out-dated data was not touched upon by Moritz Hardt in How big data is unfair but in the article: Big Data’s Disparate Impact by Solon Barocas and Andrew D. Selbst, they noted that:

The individual records that a company maintains about a person might have serious mistakes, the records of the entire protected class of which this person is a member might also have similar mistakes at a higher rate than other groups, and the entire set of records may fail to reflect members of protected classes in accurate proportion to others ( Barocas & Selbst p. 684)

The selection of data also has a huge bearing on the fairness of an algorithm. The White House report points out, for example, that if you were to only collect data from iPhone users the algorithm that uses that data would only benefit people who could afford an iPhone. This same concern was expressed by Solon Barocas and Andrew D. Selbst. They gave the real life example of the app called Street Bump which could detect the presence of potholes on the road and notify the city accordingly. They went on to explain:

In particular, systematic differences in smartphone ownership will very likely result in the underreporting of road problems in the poorer communities where protected groups disproportionately congregate. If the city were to rely on this data to determine where it should direct its resources, it would only further underserve these communities. (Barocas & Selbst p. 685)

Ideally we should select data from sources that would lead to fair and equal algorithms but that is not always practical or possible. As Moritz Hardt points out in How big data is unfair, by definition minority groups will always have less potential data than the majority group. This means that algorithms will often suit the majority and be less accurate for minority groups. One example of where this sort of bias was present was in “Nymwars” where algorithms were tasked in determining if a given name was fake or not. The algorithm was most likely trained with most of the data being typical white names. So when unique and cultural names were sent through the algorithm, they were to be more likely to be flagged as being fake. Examples like this show that the differences in sample size cannot be ignored when attempting to create an unbiased algorithm.

We should also be cautious of historical and current bias when selecting data. Things like redlining which was used to segregate neighborhoods, and past employment history where certain groups were underrepresented, cause there to be bias in the data itself. Both the White House report and the Barocas and Selbst article point out that if an algorithm were to be trained on past employment history data, mostly white men would be selected as the algorithm would see them as better candidates for “fitting into workplace culture”. The Barocas and Selbst article also points out examples of current bias:

Professor Latanya Sweeney discovered in a study that found that Google queries for black- sounding names were more likely to return contextual (i.e., key-word triggered) advertisements for arrest records than those for white-sounding names. (Barocas & Selbst p. 682 -683)

Bias has corrupted many different algorithms in many ways. One example of a corrupted algorithm can be seen with Microsoft’s twitter chatbot called “Tay”. Tay started to tweet racist and anti-Semitic tweets after mimicking some of its followers. This is an example of bad foresight from the algorithm developers and bad data selection. By using follower’s comments as its training data, “Tay” was limited to learn from a small selection of users so it’s tweet vocabulary reflected the thoughts of a few and not the many. “Tay” is an example of what can happen with bad data selection.

The recidivism algorithm called COMPAS is another example of of an algorithm exhibiting bias. In previous cases, COMPAS was twice as likely to categorize black offenders as being high risk over white offenders while both did not actually re-offend. This bias may have to do with both historical and current biases present in the data as there is and has been a disparity between white and black arrest rates. COMPAS also could be using data such as area code as a proxy for race. Either way, COMPAS is another example of how bias can corrupt an algorithm.

Another example of bias in an algorithm comes from Winterlight Labs, a Toronto-based startup. Their algorithm performs auditory tests for neurological diseases like Alzheimer’s disease, Parkinson’s, and multiple sclerosis. The problem was that the algorithm only worked well for english speakers with a certain canadian dialect. The bias comes from biased selection of data. If the algorithm was only learning from data about speakers of a certain language and dialect, the only people who would benefit from the algorithm would be people who talked the same as in the training data. This bias presents a major disadvantage to many groups and shows how important unbiased training data is.

Yet another example of bias in a machine learning algorithm is the algorithm that was used to judge beauty in an international beauty contest in 2016. The algorithm, called “Beauty.AI”, started to associate beauty with skin tone and seemingly picked winners on the basis of race. This disparity is most likely a problem in the selection of data the algorithm used to be able to measure beauty as well as bias in the data itself. If the algorithm selected data from groups who prefer lighter skin tones, it would be no surprise that the algorithm would exhibit that same bias. The flaws of this algorithm further show the importance of fair data selection.

One more example of bias in machine learning is Amazon’s facial recognition software “Rekognition”. This software incorrectly labeled people of color as being criminals at a far greater rate than their white counterparts. Just as with COMPAS, the data selection and the historical bias in the data used is most likely the reason for this disparity.

The machine learning algorithm used by Georgia State, Graduation and Progression Success (GPS) Advising program, was praised in the white house report saying that big data was being used to drive student success but the success seen may be coming with hidden consequences. Even though this algorithm has helped raise graduation rates, the data it uses may be proxies for race and gender. Since the algorithm takes into account what classes a student has previously taken and since people of color are often given less resources to succeed than whites because of things such as underfunded school districts, the algorithm would disproportionally place people of color in easier and less financially rewarding classes. This would lead to a disparity in jobs between races and further exhibit signs of bias. The GPS program seems to be designed with good intentions, but bias like this can not be ignored.

Bias is a hard thing to avoid when creating machine learning algorithms, but there have been cases where algorithms have actually helped to remove bias in decisions normally made by humans. In 2016 article in MIT News posted by Larry Hardesty, he talks about a group from MIT and a group from Georgia Tech creating algorithms that use statistical methods that can tolerate corrupted data and deal with biased data. The algorithms use robust statistics techniques such as estimating means and identifying outliers to resist data corruption. Hopefully more algorithms like these will be able to limit the corruption found in data and lead to fairer algorithms in the future.

The takeaway from all this is to be aware of bias in data and how biased data leads to biased algorithms. Since machine learning is the way of the future we need to make an effort to keep our biases out of the future.