Let us end algorithmic discrimination

It is possible to regain human control over the decision-making power of predictive algorithms and end their discriminatory effects on underprivileged people.

Henrik Chulu
Techfestival 2018

--

Academic researchers and investigative reporters have shown in case after case that when organizations hand over decisions to predictive algorithms, it can lead to outcomes that are biased against gender, race, and class, reproducing inequalities already present in society.

To illustrate the potential real world harms of predictive algorithms in practice, imagine yourself as an African American woman searching for a new job. As you type your keywords into Google, the search engine shows you a series of relevant job ads. But as Carnegie Mellon professor Annupam Datta and colleagues have shown, Google shows ads for prestigious job opportunities nearly six times as often to men than to women.

This is one subtle way that algorithmic advertising perpetuates an already existing gender divide in the job market. Now, imagine that you still land an interview for a job that you are fully qualified for. At the interview, your prospective employer asks about your criminal history. You are puzzled. He said your name came up in a Google search ad with a link to your arrest record.

As Harvard professor Latanya Sweeney has demonstrated, in web searches for 2000 racially suggestive names, searches for African-American sounding names had a 25 percent higher likelihood of being served with ads for arrest records.

Even so, you explain that you have never been arrested. After this algorithmically generated misunderstanding has been cleared up, you get the job. Congratulations! Now you have the economic means to buy a house so you start looking for one. You use Facebook as a starting point and your searches are immediately met with relevant targeted advertising.

However, for some reason you see no ads for houses in the prestigious up-and-coming neighborhood that you intend to move into. This is likely because Facebook has let advertisers exclude users from seeing ads based on their “ethnic affinity”, a poorly veiled proxy for race and one of the tens of thousands of categories that Facebook uses predictive algorithms to sort users into.

This hypothetical account illustrates how advertising driven by predictive algorithms encodes stereotyping and biases with real discriminatory results for end users. But algorithmically aided racial and gender discrimination is by far not only a problem in online advertising.

Algorithmic decision-making is put to work in many areas that affect the lives of millions of people. Besides the labor and housing markets and advertising in general, algorithmic decision-making is being put to work in areas such as policing, the criminal justice system, the education system, and the insurance market. In all these areas, the outcomes of decisions made with the help of predictive algorithms have potentially discriminatory and harmful impacts.

In her book Weapons of Math Destruction, mathematician Cathy O’Neil describes how these and other fields get distorted by algorithmic biases. As a catch-all term for this effect she coins the term that is the title of her book. She defines weapons of math destruction (WMDs) by three main characteristics: opacity, damage and scalability.

First of all, it is almost never transparent how the algorithms arrive at their predictions, making it near impossible to appeal or seek recourse against the unfair decisions they assist. Secondly, real harm is caused to real people and this harm is reinforced in feedback loops where the algorithm gets even more biased with new input. Lastly, the ability for an algorithm to be applied broadly in society makes it possible to scale this damage to broad swaths of the population.

Predictive algorithms are biased because we are

The reason algorithms get to have social biases is not ill will on the side of their developers. It is an effect of how machine learning works at the most fundamental level.

The type of predictive algorithm most widely in use today, so-called supervised machine learning, works by recognizing patterns of relations in big data sets. But to do this, the algorithm has to be trained by feeding it massive amounts of historical training data. Using this data, the algorithm constructs a model of the world that is then used to infer similar patterns in new sets of data.

If this training data is biased by discrimination already taking place in the world it represents, this bias gets reproduced in the model of the algorithm. Whether from conscious discrimination or based on inherent biases, the result is the same: The decision-making algorithm inherits and perpetuates the inequalities of the society in which it was created.

Even if the developers have done their best to rid the training data of sensitive categories like race, bias may show up in the model, because these categories can be inferred from other data. For example, in a highly segregated society, race can often be inferred from geographical data.

“Garbage in, garbage out” is an old saying in computer science. When it comes to predictive algorithms, the same is true: bias in, bias out.

Black boxes inside of black boxes

Because these seemingly neutral algorithms are used to make decisions about the lives of real people, their impact can be unfair or even damaging if they contain hidden biases.

Living in a democratic society, we expect public institutions to be fair, transparent, and accountable. We expect decisions that affect our lives to bound by the rule of law, making it possible to demand explanation, and appeal and redress of those that are based on faulty premises.

However, more and more administrative decisions that impact everyday life are made using algorithms that are ‘black boxes’ meaning systems where we can only observe what goes in and what comes out, but not what happens out of sight.

The opacity of these algorithms is both organizational and technical.

Organizations will often resist giving the public insight into their inner workings unless compelled by law to provide open records on request. And even then, the developers of algorithms often treat them as trade secrets, making the way they work unavailable to the public.

On the technical side, many machine learning algorithms are so convoluted that even their developers have difficulties explaining how they actually work.

If the public has no way of gaining insight into the way algorithmic decisions are made, there is no way of pointing out the origins of their biases. In turn, this makes it impossible to hold organizations accountable for the unfair and unequal impact of their decisions.

Unless we take steps to improve the transparency of algorithmic decision-making, algorithms will continue to drive social inequality by reproducing the biases and discrimination already entrenched in society.

How to achieve equality before the algorithm

There are three main avenues available for countering the discrimination by algorithms.

At the basic level, the lack of diversity in the tech sector is one of the reasons that algorithmic bias gets to influence decisions. Most of the developers of these systems are privileged when it comes to gender and racial discrimination. Increasing diversity in the tech sector will unearth some of the unconscious biases and assumptions that otherwise go unchecked in the development of technology.

By itself, however, increasing diversity in the tech sector is not enough to make sure that algorithmically assisted decisions are fair. Legal liabilities and sanctions are equally necessary.

While it is impossible to regulate algorithms so as to make them unbiased and fair, it is possible to counter the harms of the biased decisions they produce. This can be done taking a legal approach called disparate impact theory. This legal concept aims at mitigating the effects of discrimination against certain groups, regardless of whether such discrimination is intended or not.

Finally, and maybe most importantly, it is necessary to break open the black boxes and develop standards for algorithmic transparency in order to work effectively against their biases and resulting discrimination.

Eight steps toward algorithmic transparency

Robert Brauneis and Ellen P. Goodman are law professors at respectively George Washington University and Rutgers University. In a 2017 paper, they outline eight criteria that developers must make available in order for the public to have proper insight into the predictive algorithms that affect their lives:

  • The predictive goals of the algorithm and the problem it is meant to solve.
  • The training data considered relevant to reach the predictive goals.
  • The training data excluded and the reasons for excluding it.
  • The actual predictions of the algorithm as opposed to its predictive goals.
  • The analytical techniques used to discover patterns in the data.
  • Other policy choices encoded in the algorithm besides data exclusion.
  • Validation studies or audits of the algorithm after implementation.
  • A plain language explanation of how the algorithm makes predictions.

Brauneis and Goodman’s eight criteria can be used as a step-by-step guide for algorithm developers to move toward a degree of transparency adequate for a democratic society.

Algorithms are deployed to make predictions as a means to solve a problem. As an algorithm is developed, a number of different data types may be considered relevant while readily available data may also be excluded, for example because of poor data quality or privacy risks. And after the algorithms has been trained, its predictions may differ in terms than the original predictive goals. Disclosing the conditions of success and the way they are achieved will allow the public to assess whether the algorithm lives up to its purpose.

In this process, there are only a limited number of techniques from statistical analysis that are actually employed to find predictable patterns in the data sets. And a key policy decision in this analysis is how to weight false negatives and positives, that is the choice of erring on the side of either missing real patterns or seeing them where there are none. The choices of technique and weighting are relevant public insights.

Validation, the effort of making sure the algorithm works as intended, is important to discover errors and biases in its predictions. Alternatively, audits can be performed by a third party. Making available either or preferably both will help give the algorithmic decisions legitimacy.

Lastly, explaining in lay terms how an algorithm reaches its predictions is key to public insight. If it is too complicated to explain how the algorithm works, disclosing this fact is equally important so the public knows that the algorithm is a black box.

Is open source code necessary for transparency?

The black-boxed character of many predictive algorithms contrast starkly with the open nature of much else in the world of technology, such as the open standards of the internet and the world wide web and the free and open source software running on most of its infrastructure.

And while having open access to the source code of predictive algorithms would make public audits easier, according to Brauneis and Goodman, this is neither necessary nor sufficient in order to render the algorithmic decision-making transparent. As they show, documenting the various steps where bias can sneak in, is a more effective measure to begin getting disparate impacts under control and to start combating widespread algorithmic discrimination.

To work towards transparency, organizations that use predictive algorithms to aid in decision-making, should make the eight criteria openly available to the public because even if researchers have access to the source code they will not necessarily find the sources of possible discrimination.

In a world where decision-making power of human beings gets handed over to predictive algorithms, transparency about the choices made by their developers is required in order to hold organizations accountable. If we do not open the black boxes of predictive algorithms, then the organizations that employ them and, as a consequence, most of society becomes a black box.

--

--