Invisible Women in the World of Data Science

Nicole Michaud
9 min readDec 22, 2023

--

I stumbled across the book Invisible Women: Data Bias in a World Designed for Men by Caroline Criado-Perez and, as a woman going into the data science field, my interest was immediately piqued. Criado-Perez covers the ways in which data has become a part of our everyday lives, the ways in which data is almost always biased against women, and how in turn, almost every aspect of our lives are affected. Though it was published in 2019, this book is only increasingly relevant today, and the topics discussed within it are something that more people should be aware of.

The lack of data, or the data gap, impacts just about every aspect of life for women. It does so in ways that many people would never even consider, possibly because they simply do not actively consider women at all. This is frequently referred to throughout the book by pointing out that in most cases, men are seen as the “default human”. This idea of male universality goes hand-in-hand with the data gap: “The presumption that what is male is universal is a direct consequence of the gender data gap… But male universality is also a cause of the gender data gap” (24). Yet, it would be impossible for Criado-Perez to touch on every way in which this occurs. In fact, in many instances, we don’t even have the data to show what we don’t have data — or at least accurate data — on. For example, when it comes to the data gap in the area of transportation, the effects are exacerbated because even where we do have sufficient data, it hasn’t been separated by sex and therefore still doesn’t tell us much about how transportation habits and needs differ between sexes. This has implications for women’s everyday lives in areas such as the ability to commute to a job or to access childcare or other services. Or, in the case of data meant to measure poverty by households in the UK, the most recent sex-disaggregated data we have is from 1977, so we don’t even know if the trends are accurate today at all. Some of the areas that are impacted by a lack of data can lead to life or death consequences, such as when it comes to the design of car safety features being made for men or chemicals in our products that mainly cause cancer only in women (i.e., BPA). Criado-Perez found a source that discussed how 2008 medical textbooks studied and revealed that “sex-specific information [was] absent even in sections on topics where sex differences have been established… and results from clinical trials were presented as valid for men and women even when women were excluded from the study” (197). She took it upon herself to determine if this had changed any since 2008, and to her disappointment she learned it was very much still the case. And these textbooks and teachings are what new generations of health professionals are being taught with– meaning they are being taught incomplete information when it comes to one sex. Though still severe, oftentimes the effects of the data gap can be sociological instead, such as the wage gap and women’s inability to even get certain jobs at times due to bias in the algorithm that reviews resumes or CVs. The infrastructure of cities and the ways transportation routes are created are with men in mind, completely disregarding the fact that with women doing 75% of the world’s unpaid labor, including childcare, that they have different travel/commute patterns throughout the day. This was explored in the chapter called ‘Can snow-clearing be sexist?’, and yes, due to the differing travel patterns of men and women throughout the day, even something as seemingly small and mundane as snow-clearing can be sexist. In areas such as this one, it isn’t even the case that we don’t have the data to improve these systems, it’s just that we aren’t using it to do anything about it.

One area that is impacted especially by the gender data gap is the GDP, due to the fact that it is based on data from surveys. This may be surprising to some, but according to Criado-Perez,

“The failure to measure unpaid household services is the greatest gender data gap of all. Estimates suggest that unpaid care work could account for up to 50% of GDP of high-income countries, and as much as 80% of GDP in low-income countries” (242).

Excluding unpaid labor from GDP calculations is to effectively exclude women. “Like so many of the decisions to exclude women in the interests of simplicity, from architecture to medical research, this conclusion,” says perez in regards to excluding unpaid labor from the GDP, “could only be reached in a culture that conceives of men as the default human and women as a niche aberration” (241). It is worth noting that even women’s paid labor also has a significant impact on the GDP. Periods in which women are experiencing higher rates of employment are clearly reflected in positive trends within the GDP (246). This goes to show that not taking women into account for simplicity’s sake leads to even more inaccuracies in data that can end up complicating life for everyone.

For many of the problems introduced by Criado-Perez, she also provides examples of other places throughout the world who have found and utilized effective solutions to those problems, proves that there are solutions– and they work! The main takeaway of these solutions is that first we need accurate and sex-specific data, then we need to create policies that are informed by this data. The main problem then becomes, how can we get this data? In a perfect world we could solve the lack of data for all areas of life. Unfortunately gathering data and conducting research is very costly and time consuming. For this reason, perhaps the areas that are detrimental to women’s health and mortality may be prioritized. Criado-Perez suggests another area to be prioritized is the inclusion of women in disaster relief efforts, as they are far more likely to be negatively impacted and natural disasters are likely to become more and more frequent as climate change progresses (300).

In order to go about fixing the gender data gap, Criado-Perez says we must first close the female representation gap (318). With more women in decision-making positions, women won’t be forgotten in decision-making processes. When it comes to the lack of women in governing positions,

“There are substantial gender data gaps in government thinking, and the result is that governments produce male-biased policies that are harming women. The data gaps are in part a result of failing to collect data, but they are also in part a result of the male dominance of governments around the world” (265).

And this is not a baseless claim: when we have more women in positions of authority, they “are more likely to speak about women’s issues, as well as family policy, education, and care” (265). Women not only speak up about different issues that may not otherwise be addressed, but they do also have influence on the laws that get passed (266). Unfortunately, even democracy has bias against electing women, so this too is not a problem with a quick-fix solution.

The effects of the data gap have consequences that are many and far-reaching. I have attempted to highlight this fact by briefly touching on just a few of the seemingly endless examples cited by Criado-Perez in hopes of providing insight into just how deep this problem runs. However, my purpose here is not to simply restate these consequences that have already been addressed by Criado-Perez. Instead, I want to examine how these effects are being perpetuated in a world so reliant on technology– and therefore on data– through the lens of data science. The field of data science largely revolves around the creation and use of machine learning (ML) models and is therefore central to the issue of gender data bias. One definition of ML is that it is a subset of Artificial Intelligence (AI) that uses data and algorithms to imitate human learning and it becomes more accurate over time. Specifically, data scientists often deal with high volume and high velocity data known as Big Data. Criado-Perez explains that Big Data is used to derive “Big Truths by Big Algorithms, using Big Computers” (XVI). This can be interpreted to mean that the tasks that we use these algorithms and data for are becoming increasingly impactful and thus have increasingly large effects on our lives.

From more severe effects such as female mortality to more mundane ones such as the perpetuation of harmful stereotypes, technological products such as machine learning algorithms are both contributing to and being fed by unequal and inaccurate gender-biased data: “With AI becoming so mainstream, cultural stereotypes that exist in the data are being perpetuated” (165). One such stereotype that can be easily disproven if only one looks at the data that we do have, is what Criado-Perez refers to as “the myth of meritocracy”. What is meant by this is that so many people have the misguided idea that in areas in which men are favored, such as recognition and growth within their careers, that it is due to merit– that they simply deserve it more than the women alternatives. This has been disproven time and time again and often shows the exact opposite: the female counterparts that have shown more dedication and success in the same field are less likely to receive the same recognition or reward (95).

Simply due to the nature of these technological products, any biases that exist in the data that they are fed are only amplified in the outputs they produce: “Machines aren’t just reflecting our biases. Sometimes they are amplifying them– and by a significant amount” (166). A phrase used commonly in data science and similar fields that sums this up is “garbage in, garbage out”. Even something as seemingly benign as the overwhelming prevalence of using male pronouns to describe all people will be picked up by a machine and recognized as a pattern. Machines do not quite have the same capability as humans to utilize context to decide the intended meaning of such pronouns, but even if they did, evidence suggests that the use of default masculine pronouns are often not clear, even to humans (5). This effect has the potential to become even more pronounced when using languages that are grammatically gendered, such as Spanish, French, or German (6). All this to say, gender neutral is not really gender neutral, and language is just one example of this. As Big Data continues to enthrall our world, we will be pushed further and further away from achieving equity and unity in nearly every area of our lives: “The introduction of Big Data into a world full of gender data gaps can magnify and accelerate already-existing discriminations” (136).

Though in many ways the gender data gaps that exist in our society are decreasing, thanks to the efforts of those who have taken the steps to be educated on them and actively work against them, Invisible Women has reaffirmed to me that this work is only just beginning. I believe that anybody that is in a field that deals with data, and maybe even those who aren’t, needs to become aware of just how deep the gender data gap has permeated every aspect of life for women. Criado-Perez sums these effects:

“Our current approach to product design is disadvantaging women. It’s affecting our ability to do our jobs effectively — and sometimes to even get jobs in the first place. It’s affecting our health, and it’s affecting our safety. And perhaps worst of all, the evidence suggests that when it comes to algorithm-driven products, it’s making our world even more unequal” (167).

My hope is that in my data science career I may not only be able to see a positive change towards a more equitable data-centered world, but that I may be able to be a part of this change.

Source:

Criado-Perez, Caroline. Invisible Women: Data Bias in a World Designed for Men. Abrams Press, 2019.

--

--

Nicole Michaud
0 Followers

Currently a student is Flatiron School's Data Science Flex Bootcamp program. My posts will be about various data science topics throughout this program..