ML bias: intro, risks and solutions to discriminatory predictive models

Anna Via
The Glovo Tech Blog
8 min readMay 31, 2021

--

Photo by Anete Lusina @ Pexels

Machine Learning (ML) models and automated data-driven decisions impact our lives much more than we might expect. What’s more, they pose many risks to us and those around us, and ML bias is just one example: there have been many proven cases of predictive models that discriminated based on gender, race and other sensitive personal attributes. This article aims to provide a concise introduction to the topic of ML bias, giving some real-world examples of ML solutions that have had some kind of discrimination, discussing some significant risks they bring to society and proposing some solutions for teams and companies to start avoiding it.

How predictive models impact your life

We live in a world where we believe all information is within our grasp from our smartphones, laptops and smart TVs, and we are convinced that this allows us to reach a world of infinite possibilities. Sadly, this is not the case. A surprisingly huge amount of automated decisions based on data and ML models filter a piece of the whole Internet information and possibilities for each individual (this phenomenon is known as Filter Bubbles). These predictive models are being used to decide things like:

  • what ads you see
  • what news you get recommended
  • what results are shown to you when searching for something on the Internet
  • what posts from your contacts you see in your social networks
  • what films, series or documentaries you see in video on demand platforms

You could think this is okay, since this way you get more relevant content without the need to waste time searching for it, but bear in mind that this does not come without any pitfalls. To start with, the information you are getting is biased, that is “with a tendency to lean in a certain direction”, which means you are seeing certain types of content and not seeing some others. A consequence of this is, for example, people only seeing affine political posts or news, which can influence elections and can boost radicalized- or conspiracy-related content. But ML bias inherent in some models impacts aspects of our lives ranging from the tiniest everyday details to your entire future:

  • what job offers you get to see
  • university admissions
  • product price and availability
  • if you get a loan and with what interest rate
  • if you get an insurance for your car and at what price
  • bail and amount of sentence in a trial

So what are these “models” that affect our lives to such an extent and how are they built? A model is nothing more than a process that uses historical information to make predictions about the future.

The problem is these processes are using the whole past and historical information, where many types of discrimination coexisted, and so they learn to be discriminatory against the same situations or groups of people that have been historically discriminated against.

For example, in a world were women are underrepresented in tech roles (only 17% of ICT specialists are women) and have lower salaries than men (women in ICT sector earn 19% less than men), a ML model learning from this data can easily learn that being a “woman” is a “bad characteristic” of a person in terms of deserving a tech job or a promotion.

Real world examples of biased ML models

Let’s put some real examples of cases of ML models gone wrong that were brought to light by the press and researchers, so we can have a better understanding on the topic and the consequences it can cause.

Women are a segment of the population that have been historically marginalized, and because of that, there are many cases of gender-discrimination ML bias:

Discrimination by race is another important exponent of discrimination in society, and as such, we can also find worrisome race-discrimination ML examples:

The examples above clearly show the threat ML bias can bring to society as they in fact automate discrimination: historically marginalized groups can now be automatically discriminated by algorithms, potentially leading to a higher extent of marginalization against them.

The consideration above should strongly urge companies and society to start caring more about such an effect and therefore prevent the use of ML models that could be biased, especially because of three main reasons:

  • Ethics: there are some clear ethical concerns that should prevent companies and society from propagating bias and therefore discrimination in society.
  • Public opinion: there is a lot of controversy around this topic and a proven case of a company owning or using a discriminatory model can result in very bad propaganda (as can be seen in the examples listed above).
  • Legal consequences: depending on the country, there are specific laws tackling the issue, such as the GDPR, which states that discrimination made on race, gender, political opinion, health… should be prevented, and that it is forbidden to use these types of variables as input for predictive models. Also other regulatory compliance rules exist, that have special effects on specific industries such as banks or insurances, to prevent discrimination on certain personal sensitive attributes.

What can you and your team do about it?

Up to this point we have discussed the impact, risks and importance of ML bias, so what remains is to understand what we can do about it within our team or company. Heads up, thisis not an easy problem to solve, but here are some ideas:

  1. Diversity and awareness
  • Make sure the team building the ML model (or any tech solution for that matter) is diverse: this should help ensure the solution accounts for diverse segments of the population.
  • Understand potential bias in your data and how variables related to human’s characteristics are being used (like gender, age, race…)

2. Data treatment and analysis

  • Deleting gender/race/… variables is usually not enough: make sure they cannot be predicted from other available variables
  • Analyze how predictions and errors differ among the segments of the population
  • Consider using simpler (and easier to explain and understand) solutions

3. Use specialized tools

  • Write your own framework from model explainability solutions: Shap, LIME, partial dependencies…
  • Existing tools from ML vendors: AWS clarify, IBM AI Fairness 360, Google’s What-If Tool, Microsoft’s Fairlearn…

It is worth noting that the main vendors of technology solutions (as mentioned above AWS, IBM, Google and Microsoft) are all developing solutions to tackle ML bias and ensure fairness in the models that are produced. This is a clear indicator of how important this topic is becoming for companies worldwide and how much we will be hearing about ML bias and fairness in the years to come.

Why are we writing about this from Glovo?

This article is actually a follow-up from the first Roundtable on ML bias and fairness celebrated this month at Glovo through Glow (Glovo Women Employee Group). There, me together with some more members of Data Science @ Glovo introduced the topic to all employees who joined the roundtable, and later on engaged in an absorbing and challenging discussion.

This initiative to generate awareness, together with sharing articles related to the topic through Slack and ambitious diversity goals in terms of hiring, puts Glovo in a better place to identify potential cases where ML bias could be a problem, for example, related to:

  • Content Recommendation: such as recommending “healthier” food to women or “fast food” to low income groups.
  • Different treatment of couriers (riders) by gender or other characteristics
  • Fraud models accounting for personal characteristics

To finalize, I would like to leave you with some interesting questions that came up during our roundtable, so you can also think further about this:

  • What is the limit between bias and personalized recommendations (e.g. can I not recommend healthier food to a woman if her behavior tells me she likes healthy food)?
  • What is the impact in accuracy loss due to deleting certain explanatory variables that could enhance the model and predictions?
  • Are biased models really worse than actual people taking decisions with their own personal biases?
  • What is the social opportunity companies have in terms of correcting the existent bias in society through non-discriminative models?

Thanks for reading!

References

[1] Ted talk by Eli Pariser — Beware online “Filter Bubbles” https://www.ted.com/talks/eli_pariser_beware_online_filter_bubbles/transcript?language=en

[2] “Methods of Math destruction” Cathy O’neil.

[3] Article about the AI Incident Database, where one can explore real world examples of ML gone bad, such as gender or racial bias, with the goal of helping AI evolve better by learning from past mistakes.

[4] FastAI free online course on Practical Data Ethics, treating topics like disinformation, bias & fairness, privacy and surveillance.

[5] Study finds gender and skin-type bias in commercial artificial-intelligence systems, Joy Buolamwini, researcher in the MIT Media Lab’s Civic Media group

--

--

Anna Via
The Glovo Tech Blog

Machine Learning Product Manager @ Adevinta | Board Member @ DataForGoodBcn