THE AMAZONIANS WILL NOT SAVE US FROM FML — FAILED MACHINE LEARNING
Reuters broke the story. Sources shared that Amazon built a hiring tool with significant bias against women. If the word “women’s” and certain women’s colleges appeared in a candidate’s resume, they were ranked lower. Here was a case of “FML” — Failed Machine Learning — that had the potential to negatively influence job prospects for women while breaking anti-discrimination laws. Concerned Amazonians emerged from rivers of data to provide evidence of bias contamination and efforts towards containment. In this case, they recognized their failed machine learning and acted appropriately by halting the project. The contamination extends beyond Amazon’s internal hiring AI tool. Authors like Cathy O’Neil, Virginia Eubanks, and Safiya Noble remind us in their books that weapons of math destruction are automating inequality through algorithms of oppression which are increasingly used not just by hiring departments but also by law enforcement.
The Algorithmic Justice League finds gender and racial bias in another Amazon AI tool which is being sold to police . Self-regulation alone cannot contain the dangers of failed machine learning that masks harmful discrimination under the guise of machine neutrality.
Amazon’s choice to scrap their internal hiring tool provides some pre-emptive defense from gender-based discrimination lawsuits and also warns us of the vulnerabilities and limitations of artificial intelligence. Machine learning techniques, which have emerged as a leading industry approach to artificial intelligence, find patterns in data. Companies invest in finding patterns that give a competitive advantage like locating top talent or saving time in sorting resumes.
However, data is not necessarily neutral. When machines learn from historic hiring practices, they can reinforce past inequalities instead of overcoming them. The sexist hiring managers or discriminatory recruitment methods of the tech industry are replaced by the faceless AI tool that underestimates women and denies economic opportunity with data-driven precision. Unsurprisingly, data-driven targeting is not only of interest to hiring managers, but police and military agencies are increasingly drawn to these tools that can amplify discrimination and be readily weaponized.
AMAZON REKOGNITION: EXTERNAL AI TOOL HAS 16% GENDER ERROR GAP AND IS 11% WORSE ON PEOPLE OF COLOR
Amazon’s FML problem is not just limited to an internal hiring tool. The Algorithmic Justice League uncovered gender and skin type bias in an external AI tool from the company called Amazon Rekognition. The technology can analyze faces in images or videos and attempt to identify a unique individual or glean demographic information like gender. For the task of determining the gender of a face, we used the same methodology that revealed substantial gender and skin type bias in AI services sold by Microsoft, IBM, and (Megvii) Face++ to evaluate Amazon Rekognition. Like its peers, Amazon fell short.
We found that for the task of gender classification, Amazon Rekognition performed better on men’s faces than women’s faces. The error rate on men’s faces was less than 1%. The error rate for women’s faces was 17%. Ironically, this 16% error gap mirrors the gender pay gap in the United States, estimated to be 18% by the Pew Research Center.
An ACLU investigation revealed the company sells Rekognition to law enforcement agencies across the United States. Given the long history of racial bias in policing and the use of skin as a racial marker, we also evaluated performance on skin type.
We found Amazon Rekognition performed better on lighter-skinned faces than darker-skinned faces with an 11% difference in error rate, exhibiting vulnerabilities that could disproportionally impact communities of color. We sent the results in a letter to Jeff Bezos and received no response.
Even when confronted with these error rates along with petitions and letters from over 150,000 individuals, nearly 70 organizations, 19 concerned shareholders, and over 400 Amazon employees who raised their voices and continue to urge the company to stop equipping law enforcement with facial analysis technology, Amazon continues selling Rekognition to police departments. When ACLU Northern California showed that Rekognition erroneously identified 28 congress members as criminals, the company dismissed findings based on technicalities sidestepping deeper questions about societal harms posed by the technology.
AMAZON HAS NO EXCUSE TO ACT WITH NEGLIGENCE
The choice to sell AI systems that can be easily abused regardless of accuracy and have also been shown to be technically immature is consequential and irresponsible.
Both accurate and inaccurate use of facial analysis technology to identify a specific individual (facial recognition) or assess an attribute about a person (gender classification or ethnic classification )can lead to violations of civil liberties.
Inaccuracies in facial recognition technology can result in an innocent person being misidentified as a criminal and subjected to unwarranted police scrutiny. This is not a hypothetical situation. Big Brother Watch UK released the Face-Off report highlighting false positive match rates of over 90% for facial recognition technology deployed by the Metropolitan police. According to the same report, two innocent women were matched with men in Scotland Yard. During the summer, UK Press shared the story of a young black man misidentified by facial recognition technology and humiliated in public. The organization is now pursuing legal action against the lawless use of facial recognition in the UK.
Even if these tools reach some accuracy thresholds, they can still be abused and enlisted to create a camera-ready surveillance state. Facial analysis technology can be developed to not only recognize an individual’s unique biometric signature but can also learn soft biometrics like age and gender. Facial analysis technology that can somewhat accurately determine demographic or phenotypic attributes can be used to profile individuals, leaving certain groups more vulnerable for unjustified stops.
An Intercept investigation reported that IBM used secret surveillance footage from NYPD and equipped the law enforcement agency with tools to search for people in video by hair color, skin tone, and facial hair. Such capabilities raise concerns about the automation of racial profiling by police. Calls to halt or regulate facial recognition technology need to contend with a broader set of facial analysis technology capabilities that go beyond identifying unique individuals.
UNLIKE PEERS, AMAZON HAS NOT SHOWN PUBLIC ACTIONS TO ADDRESS AI BIAS AND LACKS EXTERNAL AI ETHICS PRINCIPLES
Finding substantial gender and skin type bias in Amazon Rekognition was not surprising as the issue is now publicly recognized by the industry. But we still uncovered a few surprises. In exploratory tests, Amazon Rekognition failed to correctly classify the face of Oprah Winfrey.
Amazon’s failure to share basic ethical principles around AI or bias mitigation steps shows a significant departure from peers who outperformed the company on the gender bias gap. In our evaluation of Amazon Rekognition, we find the company performs worse than direct competitors IBM and Microsoft in the original Gender Shades study. Both IBM and Microsoft acknowledged the bias issues revealed in this study were indicative of industry wide challenges.
After working to make technical improvements, Microsoft announced a new system that performed 20 times better on one category using their own undisclosed evaluation data. This also means before the self-reported improvements they performed 20 times worse, reminding us yet again that no company is immune from bias. Companies are also not immune to pressure, which we must continue to apply to demand ethical and responsible use of AI.
After the company faced public and employee backlash for their $19.4 million contract with ICE this summer, Microsoft President Brad Smith called for government regulations to set guidelines around facial recognition technology. Internal pressure through employee activism at Google led to the company to stop its bid for Project Maven and Project JEDI, two Department of Defense contracts that seek to integrate commercial AI technology into the US military.
If Amazon secures the $10 billion Department of Defense JEDI contract, failed machine learning will be part of the deal necessitating safe guards and mechanisms for accountability. Like their internal hiring tool, Amazon Rekognition presents another case of failed machine learning with real-world consequences. Amazon’s caution shown when developing AI tools that impact internal hiring operations should be extended to the external AI tools they are selling to police departments and eager to develop for military applications.
The United States still has no federal laws governing the use of facial analysis technology making it easy for companies like Amazon to sell questionable AI products. Given what we know, there should be a moratorium on the use of facial analysis technology for policing. Communities not companies should determine whether and how this technology is used by law enforcement. We don’t have to start from scratch. The Community Control Over Police Surveillance (CCOPS)model provides a community centric framework that can help curb the proliferation of dangerous and failed machine learning packaged as advanced artificial intelligence.
Joy Buolamwini is the founder of the Algorithmic Justice League which uses art and research to illuminate the social implications of artificial intelligence. Her MIT Thesis Gender Shades uncovered the largest gender and phenotypic disparities in commercially sold AI products. She is a Rhodes Scholar, Fulbright Fellow, and a Tech Review 35 under 35 honoree who holds three academic degrees.
She is pursing a PhD focused on participatory AI at the MIT Media Lab’s Center for Civic Media.