How You Can Accidentally Make Discriminatory AI

Reid Blackman, Ph.D.
Product AI
Published in
3 min readApr 13, 2021

Amazon receives hundreds of thousands of resumes a year. At some point, it’s just not possible to have every one of them read by a person. To that end, Amazon decided to create an ML model that would “read” the resumes and determine which individuals are interview-worthy and which are not.

Just as with all ML development, Amazon needed to provide their algorithm with examples to learn from, aka, training data. In this case, they thought, ‘We should train our algorithm to make the kinds of decisions our hiring managers have been making for the past several years.’ So, they took the resumes that had been received over the past several years, labelled them in the system as either ‘led to an interview’ or ‘did not lead to an interview’, and the algorithm “learned” what interview-worthy resumes look like. From all that data, identified what the “interview-worthy” pattern looks like.

It turned out that one of the patterns the algorithm learned was something like, “We don’t hire women here”. As a result, it started red-flagging resumes the model determined to be a woman’s resume. For instance, if the resume listed “NCAA Women’s Basketball” - red flag, and interview denied.

Why was this pattern in the data? For a variety of reasons, probably including, among others: biased or discriminatory hiring managers, a history of women being discouraged from applying to jobs in STEM in the first place, men lying on their resumes more than women, and so on. Notice that the engineers, the data scientists, and the product manager didn’t consciously manifest any biased attitudes towards women. The bias was introduced by accidentally introducing a data set that favored hiring men over women.

You might think, ‘This is an easy problem to solve. Just tell the algorithm to ignore the word ‘women’, ‘woman’, etc”. In fact, the team did just that. But the algorithm discovered other patterns, e.g., “We tend to interview people with resumes that include words like ‘execute’ and ‘captain’. As it turns out, those words are used more often by men than by women. Using the word “execute” turned out to be a proxy for “man”, and so in favoring that word when evaluating the resumes, it favored men. This is not easy to stop - there are lots of variables that can correlate with demographic data (e.g., zip codes in the United States correlate with race).

After two years of trying and failing to de-bias their ML model, and to their great credit, Amazon halted the project.

While it is an organizational goal to de-bias ML models, the first line of defense are the engineers, data scientists, and product managers who create these products. It will become increasingly important that they become aware of the issue and that they be capable of executing bias-mitigation strategies.

--

--

Reid Blackman, Ph.D.
Product AI

Philosophy professor turned (business+tech) ethics consultant