Beyond Pink and Blue: The Fundamentals & Collection of unbiased gender data

Published in

Fields Data

7 min readJun 23, 2023

Have you been invited to a baby shower where a burst of pink confetti is let loose? Your brain is immediately wired to think ‘It’s a girl!’. But why do we associate pink with girls? It all started at the beginning of the 20th century when retailers sold “sex-appropriate” colored clothes, and continued despite the Women’s Liberation Movement in the mid-’60s. Initially pink was for the boys as it was seen as a stronger and brighter color, but the tables turned when large-scale department stores suggested otherwise. So perhaps it’s all just a strategic marketing ploy controlling your brain! Nonetheless, do you think we require colours to tell gender apart? Can we go beyond it?

I have limited experience on the subject and this is a summarization of my learnings from the course Gender Data 101 held by data.org.

What is Gender?

Gender isn’t only two colors representing Males and Females on a graphic, it is a whole spectrum. Here are definitions from The Trevor Project of some of the different terms relating to gender:

What is Gender Data?

Gender Data can be both quantitative and qualitative. On the one hand quantitative gender data can be sex-disaggregated statistics on income levels. On the other hand qualitative gender data can stem from interviews and participant observation of income-generated activities.

Influence of Language

Different languages display the change in gender of the noun ‘Chair’

If you are multilingual and speak the above languages, you will notice the gender of the noun “chair” changing from one language to another. In English it is neutral, however, in Spanish, French and Hindi, it is feminine, whereas in German and Russian it is masculine. Does this affect your way of thinking about chairs in general? Studies show that in fact, it does for monolinguals who speak a gendered language.

For example, if you ask a monolingual Spanish speaker to personify the word chair, they are more likely to start off with a description of a woman. This is because the speaker believes that language shapes who we are. To some extent it does, but if you ask a bilingual speaker they’ll be more likely to say it’s a formal property of the language and personify the word “chair” to their own liking.

Tackling Biases

So how do we collect gender data unbiasedly? This crucial step orchestrates the entire process.

There are also different categories of biases, to name a few –

Sampling Bias: Choosing a sample that is not representative of the population being studied i.e. participants are not being randomly sampled.

Ex: Oxfam, for their resilience building project, randomly picked the gender of the respondent which meant picking any household decision-maker.

Non-response bias: Certain members of the population being studied do not participate in the data capture. There are many reasons for non-response bias.

Ex: Income levels of the population being studied could be underestimated as those working longer hours have a higher non-response rate. [ALNAP Discussion Series, 2014]

Response Bias: An external factor that can influence a respondent’s answers.

Ex: During the pandemic, Oxfam surveyed women affected by the presence of ISIS in Iraq using a Computer Assisted Telephone Interview (CATI) platform. In one instance, the entire interview was controlled by the husband who responded on his wife’s behalf.

Demand characteristics: Participants of a study can influence its outcome.

Ex: If a superior is conducting a study of their employees, the employees are significantly more likely to give a biased response.

Social Desirability Response Bias: A respondent is influenced to provide answers that are socially desirable and are not necessarily genuine responses.

Ex: A mother may not want to tell the truth about how often her children have been fed if she believes that the truth would reflect poorly on her. [ALNAP Discussion Series, 2014]

Question Wording/Ordering Bias: If the question is worded or ordered in such a way that it favours one response over another.

Ex: Consider a multi-sector questionnaire which includes questions on access to water and health care. If the question ‘What are your top three priority needs?’ is placed at the end of such a questionnaire, respondents are more likely to mention WASH or health-related concerns. [ACAPS]

Where are they in the reports?

The term LGBTQIA+ is known to many, but how are they included in the data collection process? Are only those who self-identify as such or who are sexually active counted? There are also individuals who engage in same-sex behaviors who identify as straight, and others who are asexual. Do we leave them out?

There are many initiatives which include Sexual Orientations, Gender Identities / Expressions and Sex Characteristics (SOGIESC) data collection. Here are two methods to do so as proposed by Edge Effect:

One Question Method

This method includes one question in which respondents are asked for their Gender Identity which includes options of Male, Female or Other. In this case, ‘Other’ is used to capture every other identity and if chosen, it is sometimes followed by further self-identification options to choose from or a free text space if their choice is not mentioned.

This method has limitations, however, as transgender individuals may not be visible in the data if they select the Male or Female gender that they identify with. Another limitation is that respondents may have different perceptions from the provided options listed in ‘Other’.

Two Question Method

This method has two questions relating to Biological Sex and Gender Identity. It overcomes the first limitation mentioned in the previous method, thus making transgenders visible in the data.

However, this wouldn’t work in countries where transgender people are not legally recognized as they are unlikely to risk the consequences of revealing their identity.

The Colonialism of Data

Are there hurdles that enumerators or participants face during data collection? How can they overcome these?

During North-South collaborations, data collectors from the global South are predominantly seen as merely a “guide” to the data collection process. They are often hurried to meet deadlines, tread dangerous routes and in some cases even go against their own community. Despite this, they don’t receive the credit they deserve and are substantially invisible in the process. In reality, they probably know the data better and should be involved in the analysis phase. The Bukavu Series is a blog project that touches on several such themes.

“Dangers, threats and ethics of perceived espionage”

Researchers who collect data by themselves may not always acquire accurate data as many don’t respond truthfully due to preconceived notions about the data collector. They are told the information collected is for research, but the majority of respondents may not even understand what “research” is. Hence, a disguise is recommended.

A course participant shared their experience on this:

When on field research in South India, they hit a roadblock, as all the interviews lined up with members from a specific group were suddenly cancelled. After about a week of inquiries, it became clear that someone had raised a concern that the researcher might be a spy for either the Indian intelligence service or for Pakistan, and that they did not want to proceed with the interviews because they feared for their safety.

Now, a disguise could’ve helped my fellow course participant avoid this catastrophe. A disguise can give the participants as well as the data collector a sense of safety. It also improves the quality of the data collected as people are more willing to provide answers.

Note that there are negative impacts to this strategy which can be ethically wrong. Think about the rights you have as a researcher when interacting within a complicated field, and consider how these risks can be limited for the participants as well as for yourself. This article by GIC tells how Congolese researchers who posed as students from a Congolese university were soon found out and then mistaken for spies!

Golden Nuggets

As quoted by Science Proves There are More than Two Human Sexes, “Sex isn’t that straightforward”!
Gender Data is data disaggregated by sex and reflects gender issues, including roles, relations, and inequalities.
Language can lead to gender stereotyping.
Biased data can lead to implications during the next steps in the project.
Try to include people with diverse SOGIESC during data collection.
Give space to those who often remain invisible in the production of knowledge.

What’s next?

This was only the tip of the iceberg! You can further explore other methods and best practices for collecting gender data. However, it’s possible that you’re not an enumerator or even someone in charge of the design of data collection. In that case, stay tuned for the second part of this blog series on how to analyze gender data!

I am very grateful to the course organizers who provided such insightful and valuable content which has helped me write this blog!

— — — — — — — — — — — — — — —
Did you like this article? Read other technical topics explained in simple language at https://www.fieldsdata.org/blogs

Fields Data is a humanitarian data-preparedness organisation leveraging local expertise to mitigate the effects of disasters.