Understanding unfair bias and product consequences in tech: Learning by doing

Aoife Spengeman
Wellcome Data
Published in
8 min readJun 28, 2019

I joined Wellcome Data Labs as a user researcher in November 2018. Since then I have been learning with my team what it means to bring data science, ethics, social science, and user-centred design together to create usable and fair products.

For 2019, we set out on an experiment to embed ethical thinking into product development. From the start, our main approach was to learn by doing. Our objective is to consider the ethical implications of our actions, and for all of the team to be involved. In this blog post I outline the learnings of myself and my team as we learn by doing.

About our product: We are developing an online service to track and analyse the reach of research in policy documents of major global organisations. It allows users to see where the research has been cited in policy documents, and to analyse the contents of policy documents. For more information, check out our open Github repo here.

What do we mean by bias?

When I first started thinking about ethics and machine learning, I noticed that the word ‘bias’ was everywhere. When ‘bias’ and ‘algorithm’ are used close together in a sentence it instils some people with a sense a fear that the world will soon be taken over by human-hating robots. Others consider how the social injustices which affect them in the real world are likely to affect them digitally. For data scientists, it is often simply a technical term that implies an oversimplified model that leads to high error rates.

Bias is an over-used and under-defined word, and this is why I am specifically going to focus on unfair bias in this blog post.

What I mean by unfair bias is a disproportionate weight or inclination in favour or against a certain characteristic, person or group, which leads to an unfair outcome. This can exist within or outside of the algorithm.

Learning #1: Identifying what bias is unfair is not straightforward

Our team began thinking about the ethics of our work by discussing steps of the machine learning process. I innocently expected to find evidence of unfair bias by breaking down the steps of the data science process and discussing the data inputs, training data, and algorithmic decisions.

Breaking down the steps of our algorithm

But this was just the starting point. I soon realised that all we ended up with was a long list of potentially unfair biases.

Here are some examples:

  • We knew that our training data was based on research publications from the year 2017 only — whether or not this translates into bias is something that needs directed algorithmic investigation.
  • In the same way we were concerned that the training data might be disproportionately based on publications from the ‘Golden Triangle’ universities like Oxford and Cambridge— does this mean the algorithm is less accurate for other universities?
  • We discussed the impact of false negatives — in other words, the research citations we don’t pick up on — is there a trend that indicates an unfair bias in the characteristics of the citations we miss?

Where do you go from there? The next task was for our data scientists to do a round of tests and algorithmic review on a subset of suspected biases, which we will publish soon.

As highlighted in Arvind Narayanan’s excellent tutorial on the 21 definitions of fairness, bias in algorithms is inevitable because there will always be trade-offs that result in biases. So even if we have algorithmic proof that a bias exists, how do we judge that it is unfair? This is the most difficult part. It depends on at least two interrelated things:

1) Having a consensus on what fairness is;

2) a reasonable understanding of the social context and communities in which the algorithm will exist.

Additionally, the assumptions around which the algorithm is designed may be based on a range of very common cognitive and social biases. For example, have we, as a medical and science funding organisation, fallen victim to ingroup bias by choosing academic scientific publications as our training data? Are we overly optimistic in believing that our product is going to benefit people and make people’s lives easier?

From going through this experience, here’s what we have learned about machine learning and bias:

1. Unfair bias can be obvious, but overlooked

Some biases in algorithms are clearly unfair and the negative impact is obvious, though no less shocking for it. For example, the fact that Amazon Rekognition, a facial analysis tool, had a 31.4% failure rate for darker skin females, and 0% failure rate for lighter skinned males. Arguably this happened because the people who made it didn’t spend enough time considering what could cause harm. Testing for accuracy parity across gender and race are known to be important, yet still often fail to be prioritised.

2. Unfair bias is sometimes nuanced and needs an interdisciplinary approach

Unfair bias is not just about accuracy rates. It can be found in various aspects of the algorithm such as trade-offs between the balancing false positives and false negatives, which are based on assumptions about what is important and what is not. It can even be caused by attributes in the data that, by proxy, link to sensitive characteristics such as race and gender. For example, someone’s home address may have a correlation to their race, or their job role may have a correlation with their gender.

Criteria for fairness also have an impact on whether or not unfair bias is found in an algorithm. For example, taking Northpointe’s COMPAS algorithm, which rated criminals on their likelihood to reoffend: The algorithm’s false positive rate for black criminals was twice the rate for white criminals. But it wasn’t Northpointe who found this out — this analysis was done by ProPublica. Northpointe took ‘predictive parity’ as their criteria for fairness, which only examines whether the accuracy rates are the same across different groups.

In order to know what unfair biases you are looking for you need a diversity of perspectives to help teams think outside of the box and anticipate how their product will impact people’s lives, that is why the interdisciplinarity of our team is so important to us, with social scientists, data scientists and techs working side by side.

Our team’s introduction to sociotechnical analysis by Dr Alex Mankoo

Learning #2: Looking for ‘algorithmic bias’ alone will not solve problems

There is a lot of hype associated with algorithms and bias in machine learning, so it is easy to forget that the very idea behind the product may be problematic. When examining our machine learning process we realised that many of our concerns were little to do with the data science process itself. Taking an interdisciplinary approach, we considered how the product is communicated and marketed, how users interpret the results, and whether we are serving the research and funding community by making it easier to measure reach of research or whether we are perpetuating the already existing unfairness of the research citation system. These matters have little to do with data science itself, but more to do with the product’s goals and implementation plan.

So we have started to consider ethics at a product level.

In January we carried out a workshop to consider the ways in which our algorithms might be used as products. We thought about cases of abuse and misuse, and unintended consequences. This creative exercise got us thinking about what we can do to try mitigate unintended harms that may result from our product.

We work closely with a social scientist who helps us think about our product from a sociotechnical perspective — thinking beyond our target users, and instead understanding how technology shapes and is shaped by society.

While thinking about fairness in machine learning is important, let’s remember that responsible technology development in itself is what will determine how machine learning will be used for good or for harm.

Learning #3: We need just enough time and space

Not every organisation has the resources to do a tonne of applied research on ethical risks. Most organisations don’t (yet) have a culture of questioning the long-term impact of the product they are working towards. So the question for our team was, ‘how much thinking do we need to do’? And this is a question that still remains. But I have learned some things.

As found with DotEveryone’s research on responsible tech, “there are gaps within development cycles and processes for catching problems a technology product or service may cause once out in the real world.” If it’s not routine, it will likely get de-prioritised. There are always more pressing issues than thinking about edge cases that have a negative impact. In the same way that long-term benefits of fitness and healthy eating are best achieved when part of a regular schedule, we should think about longer term impact of our products as a team habit so it doesn’t get forgotten about.

I experimented with a couple of options with the team: ad hoc discussions after stand-up, one-off workshops, virtual online discussions… but I found it difficult to build momentum and identify actions. As a new initiative we plan to have monthly meetings to think about ethical impact. We haven’t yet defined what exactly we will cover in these meetings, but we will experiment with different formats. We’ll report back when we have some feedback on that!

Space doesn’t have to be physical either, which is why I have recently delved into the well-established world of Github. As explained in Danil’s blog post earlier this year, the rhythm of ethical analysis is often misaligned with the speed of technology development. By bringing ethics to where the algorithmic development is happening, we bring the two closer together, making it meaningful and actionable. As a reluctant Github user, I have taken the plunge into pull requests, and issues, dodged some scary merge conflicts, and as a team we are working on a workflow that makes questions about fairness and unintended consequences an integral part of product development.

Machine learning is rapidly becoming mainstream, which is why it is critical to make ethics a normal part of product development. We will be blogging regularly about how we are doing this so follow our Medium channel to stay updated!

References:

Alexandra Chouldechova (2017). Fair prediction with disparate impact: A study of bias in recidivism prediction instruments: https://arxiv.org/pdf/1703.00056.pdf

Colin Priest (2019). How Do You Define Unfair Bias in AI?: https://blog.datarobot.com/how-do-you-define-unfair-bias-in-ai

Cassie Kozyrkov (2019). What is AI Bias?: https://towardsdatascience.com/what-is-ai-bias-6606a3bcb814

Libby Kinsey (2019). The road to AI is paved with good intentions: https://medium.com/digital-catapult/the-road-to-ai-is-paved-with-good-intentions-87870eb609e2

Demand (2017). Understanding Bias in Algorithmic Design: https://medium.com/impact-engineered/understanding-bias-in-algorithmic-design-db9847103b6e

Rachel Courtland (2018). Bias detectives: the researchers striving to make algorithms fair: https://www.nature.com/articles/d41586-018-05469-3

Sam Brown (2019). An Agile approach to designing for the consequences of technology: https://medium.com/doteveryone/an-agile-approach-to-designing-for-the-consequences-of-technology-18a229de763b

ProPublica (2016). Machine Bias: https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

--

--

Aoife Spengeman
Wellcome Data

UX researcher at Wellcome Trust Data Labs. Thinking about ethics in data science, human-centred design, and best UX research and design practices.