AI Exposes Gender Income Inequality

An AI model that was built on income survey data exposed more than what’s used to be thought of gender income inequality. Not only that on average female employees earn less than their male counterparts but also gender is one of the key differentiators for determining one’s salary.

Gad Benram
The Startup
4 min readJan 26, 2021

--

Photo by Magnet.me on Unsplash

Israel’s Machine & Deep learning community has conducted an extensive survey regarding employment trends and salaries. Omri Goldstein and Uri Eliabayev revealed not only average salaries but also included in their work a visualization of two Decision Tree models that were built using the survey’s data (see here and here). The models clearly show how important gender is in determining the salary of a Data Scientist, a profession that was not once referenced as “the most sexy of our time”.

It’s important to mention that the team that published the report didn’t intend to perform research on income inequality, hence the results should be considered with care

The model presented in the report demonstrates that being “Female” indicates having lower salary. How low? For employees without a PhD, being a female was given an estimation of 32,000 ILS compared to 35,000–39,000 ILS for male. But average numbers don’t tell the whole story. In order to more profoundly understand the significance of these results it’s better to explain some more about Decision Trees.

One of the two trained models. Female appears in the top right part of the plot. Origin: here

How Decision Trees Work

Like any AI model, Decision Trees are built using data, they don’t have any prior bias before being fed with data. The goal of the DTs models is to generate a tree that would determine the expected value (in this case the salary) by navigating between yes and no questions. In the example above the expected salary for a person who has more than 6 years of experience and has a PhD degree is 45,000 ILS (monthly).

Example of how a tree is built. Images by Pixaline from Pixabay. Read more here

How is it different from just calculating the average? Unfortunately, many publications show gaps between male and female average income. However, averaging sometimes blurs the details of the picture. Imagine a hypothetical case where both genders earn the same but females don’t get promoted to senior roles, in this case the average salary of males will be higher even though for the same roles both genders show no difference. Additionally, averaging salaries by gender doesn’t answer questions like: what is more statistically important when predicting one’s salary? Years of experience or education?

Answering this question is of course complicated. Some previous studies examine different angles of discrimination. In this case some light may be shed by AI “automatically”. When Decision Trees are built, the algorithms behind them try to split the population by the most informative features. If experience is “more important” than education than it should appear higher in the tree (closer to the first node). See the example above and the link for detailed explanation.

Is gender really more important than education?

So how highly important is gender for estimating income? Pretty up high, according to the data collected by the publishers. It means that upon trying to guess someone’s salary, the AI model thought that it would be more informative to ask for one’s gender rather than their skills or background.

If you wish to feel how outrageous the results are: just imagine a technical recruiter that doesn’t care if you have a CS degree or if you have 7 years of experience working for a corporate. One of the first questions they’d ask you is your gender.

A small comfort can be found in the fact that Gender doesn’t appear high in all of the trained models, and some sub-branches don’t include it at all. This implies that in some slices, there is not much of a difference between male and female. However, when it does appear, the tree always points to male as having higher expected salary.

Discrimination has been here before AI

A lot have been said and written about the fear from AI and the biases that it may generate when used irresponsibly. Nevertheless, this case demonstrates two points in my opinion:

  • AI does not create biases, it merely reflects and amplifies what we as a society create.
  • AI can be used to learn more about the true nature of discrimination.

In recent years several papers and projects that tackle the concept of responsible AI were published. Some companies suffered painful public criticism after it had been revealed that their AI models reflect biases against people of certain gender or ethnicity. In order to avoid falling into these traps the Data Science community works to find best practices to how it can be more careful when generating AI models. But the simple fact is: biases are strongly present in the data because they reflect who we are as humans. Unfortunately.

Feel free to contact Gad Benram about this article.

--

--

Gad Benram
The Startup

Machine Learning Architect and Google Developer Expert.