Why sexist AI comes as no surprise

The ethics of artificial intelligence is one of a few topics that get a lot of attention and where the discussion often gets very heated, very fast. One glimpse at the comment feed of an article recently published by The Washington Post about robots trained on AI that became racist and sexist is enough to demonstrate the range of opinions attracted by this matter.

We could hastily delegate these ethics issues to the software engineers who are building AIs and who can even exercise agency over the potential use of the AI.

However, are social biases within the tech industry itself even being addressed?

Since 2011, Stack Overflow (SO) has been conducting an annual survey attempting to capture the current state of the professional and enthusiast programmers’ community, covering a range of questions regarding education, experience, favourite programming software etc.

To explore one of the most common biases that are deeply embedded in our society, I will focus on the demographic aspect of the survey. Specifically, I will consider gender representation and some of the main differences between genders within the coding realm.

Gender representation

Looking at the gender representation of 2021* survey respondents, out of 83 439 people, a whopping 89.7% of survey respondents identified as men, while only 4.9% identified as women and 3.1% as Non-binary, genderqueer, or gender non-conforming.

This result is quite staggering as it means that 9 out of 10 people who participated in the survey are men!

The graph below leads to even more surprising insight into the likely origins of tech industry’s gender related biases.

Over the five year period (2017–2021), the participation of female as well as NB-GQ-NC-TG (non-binary, gender queer, non-conforming and transgender) coders in the Stack Overflow’s annual developer survey has been stagnating. Yearly changes cannot be described as either significant increases or decreases in representation as they oscillate in the range of ±1.5%.

The only significant yearly changes are in the participation of male survey respondents. Although, by inspecting the survey entries without disclosed gender (entries with either missing values or answered with ‘Prefer not to say’), a direct correlation between the two groups is found.


2021 survey included two questions that can further contextualise the gender related bias. By inspecting the responses to questions

‘Do you consider yourself a member of the Stack Overflow community?’ and ‘Are you a member of any other online developer communities?’, it is clear that men not only make around 90% of participants, they also have consistently higher engagement in both Stack Overflow and other online developer communities. So, why exactly are female and NB-GQ-NC-TG developers less likely to be active members of SO?

Developing interest in coding

Another potential factor to explore is the age at which different genders start coding. The graph below shows the answers to the question ‘At what age did you write your first line of code or program?’ grouped by gender identities of the respondents and mapped to their age.

Across all age groups, the most common age male respondents started coding was between ages 11–17. The second most occurring age range for developing interest in coding amongst men is 18–24 years.

Comparing these results with female survey respondents, the difference between start of coding at 11–17 years and at 18–24 years is not as pronounced.

These results indicate that, compared to men, women are less likely to start coding earlier in life. Interestingly, results for non-binary, gender queer, transgender and non-conforming respondents are more similar to those of men than women.

The questions discussed here are just some of the aspects that we should all be asking more questions about and be aware of when considering ethics of AI. It does not take a lot of mental effort to think of some impacts the lack of gender inclusion within tech community might have on the technologies that same industry is developing.

How can we even consider building unbiased models if we are choosing to ignore the bare minimum requirements of equality, diversity and inclusion tech industry continues to struggle to meet?

More details on the analysis can be found here.

*Full data set is available on Stack Overflow website. At the time of writing this article, the dataset for 2021 survey hasn’t yet been made public.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Romana Popocovski

Romana Popocovski

I’m a Business Intelligence Analyst who loves finding different ways of communicating data and information. Skills: Python, Power BI, Tableau