Lady Justice holding a very simple algorithm — if x < y then smite_with_righteousness_justice [Modified from]

Recognizing Discrimination in Algorithms

We shape our tools and thereafter our tools shape us — Marshall McLuhan

8 min readDec 8, 2016

#tldr Any algorithm can — and often will — reproduce the biases inherent in the data it’s using. The problem takes on many form and can suffer from intentional manipulation. Hiding our decision process inside blackboxes clearly won’t do. The question of “What can we do?” is not easy and will require the voice of many stakeholders. Here are a few eye-openers and questions to ask yourself and colleagues to help you on your journey.

Machine Learning and algorithms have become increasingly accessible to anyone with time and motivation on their hands thanks to an open and democratized computer science community. In my last post we discussed the implications of using those tools blindly in the context of human lives. We’ve shown that the widespread disregard towards obtaining an adequate understanding of causal mechanism behind your data is leading to consequential biases and discrimination in various aspect of civic life (job placement, justice, healthcare, etc.). This post will focus on equipping you with the knowledge to recognize some instances and provide some ideas to steer the conversation towards constructive reflection in your social circle.

Different Types of Evil

Perception is linked to expectation. Students will tend to score lower on IQ tests when teachers expect them to perform poorly; Medical treatment can elicit placebo effects that influence pain; Expectation of a specific visual cue facilitates perception of that object, but hinders perception of an object from a different category. It’s an evolutionary curse that allows efficient generalization of the world around us but simultaneously creates tremendous blindspots in various situations.

As it is, biases and discrimination comes in many form, sometime intentional or unintentional, with only the most common being self-evident. Not being aware of their different shapes and sizes risks not recognizing them for what they are. This section attempts to summarize a few manifestations of discrimination along with additional concerns one should look out for in their machine learning application. It also aims to provide some point of discussion to steer the general conversation towards constructive grounds.

Illustration of some forms of discrimination. Each circle is an individual drawn from a population. The color encode whether it belongs to a protected class (i.e. a property of individuals which we deem should not be used to discriminate, such as gender, race, religious belief, etc.).

Explicit discrimination: Membership of a protected class is explicitly shown to have a worse outcome than non-protected due to an explicit threshold.
Redlining: “The practice of arbitrarily denying or limiting financial services to specific neighborhoods, generally because its residents are people of color or are poor.”
Disproportionate cut: A variant of Redlining where a protected class need not be a majority of the redlined population; the proportion can only be disproportionate compared to the the population as a whole.
Redundant encoding: The notion that protected class can be encoded in non-protected attributes, thus removing the original variable (e.g. sex or race) does not prevent an algorithm from discriminating against.
Self-fulfilling prophecy: Deliberately choosing “bad apple” members of a population in order to build a negative “track record” in order to justify discriminative actions (e.g., labeling the whole population as bad)
Reverse tokenism: In the case of banking, the idea of denying a highly creditworthy person a loan so the bank can refute charges of discrimination against specific classes.
Historical Bias: Data is not constructed in a vacuum and will propagate bias inherent of the society it originates from if not carefully taken into account.
Acquisition Bias: Most datasets come to life out of a messy process. The acquisition methodology and patterns forgotten and lost in the process leave limited understanding of statistical concerns such as randomized sampling or proper population coverage.
Resource imbalance: Individuals with more resources can better prepare for certain social processes. This is made particularly self-evident and poignant in legal cases where the quality and amount of defense lawyers are determining factor in the outcomes.
External circumstances: A series of factors in your life with which you had nothing to do but statistically predispose you to a certain outcome (e.g. coming from a bad neighborhood).

What can you do?

During one of his presentation Abe Gong proposed a series of questions (to which I’ve added few of mine) to ask your peers as a way to stir the conversation forward and raise awareness. These are suggestions of course and I would love to hear your own thoughts in the comment section.

1. Are you measuring what you think you’re measuring?

This is the biggest point of failure in most Machine Learning applications and one that is almost by design. You’ll find the saying in most introduction classes: “ML is about finding patterns based upon the given relationship between the inputs and outputs”. It’s a seductively simple concept, but pursuing an understanding of any real-world phenomena is harder than it looks (e.g. spurious correlations). There are all sorts of things we cannot readily measure from a person (e.g. ability for self-control, delayed gratification) and even less having only access to their online profile and activity.

To obtain some measure of accuracy practitioners are invited, if not encouraged, to use any data available to construct a discriminative feature space. Anything is up for grabs but in that process we rarely reach a level of understanding that enables us to conclusively say anything about anything. The Machine is meant to do the learning, not us.

If it’s ever to really be useful to society (and not only the market that leverages it) we need to

Use caution when slapping labels on sets of features that correlate well with the desired outcome (especially if they seem to imply causality)
Base our decision-making on those properties for which a sufficiently detailed causal mechanism is known, and deemed ethically neutral.

2. Are the statistics solid?

Take a leaf out of NASA IV&V Program: if the statistics matter (i.e. most of the time) it’s crucial to have someone check your work, repeatedly, and ask the right questions. Assumptions behind models are rarely articulated, let alone defended. Is the technique robust to some departures from the model assumptions? What reason is there to believe that the model assumptions are true for the situation being studied?

Read more on *Statistics Done Wrong* or *Common Mistakes in Using Statistics*

3. Did you investigate Statistical parity?

There is no (from my humble knowledge) off-the-shelf method or toolbox that will readily combat biases and fight discrimination off the battle scene of your new predictive model for you. One interesting approach, Statistical Parity, measures the difference between the probability that a random individual drawn from a protected-class (e.g. female) is labeled 1 versus a random individual from the complement class. When that difference is small, the classifier is said to have “statistical parity”. This measure can help discern certain type of discrimination introduced within your model but it is not robust to all the evils previously described (particularly those related to willful discrimination). When there is bias in the data, accuracy is measured in favor of encoding that bias. This often means that achieving fairness on a project forces the tradeoff between high accuracy (on a biased dataset) and low statistical disparity (given a set of rules imposed by a desirable fairness).

4. Have you considered Fairness Auditing?

Oh did I just say a bad word? Nobody likes tax audit of course, why the hell would I even suggest such a thing? For all the good reasons you can think of (a few here). Not every case of machine learning needs to be under the implacable microscope of justice but most should go through an honest internal verification and validation. How does your system behave in cases of biased data? Not only does it inform you on an ethical level, but also serves your companies interest to know and act on this information: Knowledge is power.

5. Are those change in power healthy?

This question invites you to go beyond who are the variables and consider what is the social equation of your system. Is it tipping the balance towards a minority? Does your ranking implicitly favor larger corporation over local businesses? Are those things positive towards your local community or society at large? These questions are not meant to lead you to some form of moral judgement or a single answer, in fact, a good rule of thumb is if the answer is unanimously “Yes” some more thinking needs to be done. The goal is to engage as many individuals and amass as much information about what people think and want towards forming a clear picture of the situation. Ethics and morality have changed throughout history.

6. How can we mitigate harms?

It’s great to see serious conversations happening around this topic yet most of us will feel ill-equipped to recognize and ultimately change the insidious practices we inadvertently introduce into our work. Not everything needs to rest on the developer’s shoulder; this conversation goes beyond the scope of your computer and should happen at all level of management. In light of the consequences you’ve unearthed, ask your colleagues what can you do as a practitioner? There is no one-size-fits-all solution yet; we should be asking for help.

In Summary

Biases and discrimination comes in many forms
We need to encourage the highest level of ethics as professionals
Good Data Science requires due process regarding your data

A robot may not injure a human being or, through inaction, allow a human being to come to harm. Isaac Asimov proposed this as the first law of robotic. I do wonder how far we’ve gone in that quest. If we’re to pursue and gain from increased automation, whether through humanoid robots or virtual agents, crucial aspects of responsibility, transparency, auditability, incorruptibility, and predictability are destined to be intricately tied to how we perform the Data Science of the future.

Reiterating on my previous post: “Criteria applied to humans performing social functions should be considered just as applicable in algorithms intended to replace human judgment” — Bostrom. Anything short of this risks facilitating and perpetuating a considerable number of our own vices into tools we so dearly want to believe will make everyone’s lives better.

This post was largely inspired by a talk given by Abe Gong I recently had the pleasure to attend. If you thought any of this was worth reading I would highly recommend you go have a look at the work of the following wonderful people (in no order whatsoever) Suresh Venkat, John Moeller, Carlos Scheidegger, Sorelle Friedler, Moritz Hardt, Cynthia Dwork, Deb Roy