
Why we need diverse quantitative data analysts
It’s important that the people analyzing data reflect the people who donate their data.
Let’s not talk about quantitative data, let’s talk about quantitative data analysts. We tend to think of ourselves as objective actors using the most appropriate method to analyze the data we are given. When we analyze data, however, we are not only analyzing them through the methods we’ve learned, but through our own cultural lens and worldview. It’s unavoidable, but pretending we’re objective actors is problematic and can create issues in data interpretation and analysis.
I’ve been caught out by this before. While cleaning some data from our study on cash for health in humanitarian settings in Pakistan and Cameroon, I came across something odd in the open-ended responses. Respondents were asked to indicate if they had any chronic health conditions, and if they said “Other condition,” were then asked to fill it in. Some respondents from Pakistan wrote “guilty” as their other condition. Now, I’ve seen my fair share of odd and interesting responses in open-ended questions, but I couldn’t help but laugh and wonder what was going on. I was ready to write these off as mischievous responses and not include them in our analysis, but when I spoke to my boss she instantly knew what was going on. “Guilty” wasn’t some mischievous response — it was a transliteration of the Urdu word for tumor or lump. Getting my boss to examine the data prevented me from making a mistake. Although my boss and I both have passion for statistics and research and can talk for hours about it, we have different lived experiences. I can’t speak Urdu, but my boss can.
My own lived experiences have helped iron out seeming data issues before. In a previous job, I worked on one study where respondents were asked to draw their social and cultural identities. My colleagues thought one respondent had drawn a dragon. There’s a large Albanian population in my neighborhood, where I’ve grown up seeing the Albanian flag, so when I saw this drawing, I quickly corrected them saying that this is the two-headed eagle from the Albanian flag and likely indicates that the respondent is proudly presenting their Albanian identity.
Why couldn’t I realize my own bias when looking at the “guilty” responses? Well, it’s much easier to point out someone else’s ignorance than to think about all of our own ignorance. But as people who analyze data, we must turn our critical lens to ourselves.We know that data are inherently biased in some form and not everyone will interpret questions in the same way, no matter how hard we try as researchers to make that the case. People are a conglomeration of thoughts, experiences, and identities, so we should expect people to answer questions as such.
When we get those gems in our data, we need to think through them. It’s possible that they have a meaning beyond our understanding. We shouldn’t be quick to write them off as outliers or problem cases that should be removed from the data before analysis (let’s be honest — it’s usually very clear when there’s a clerical error or mischievous response). Rather, we should consult other researchers and staff who might be able to lend their insights. For example, at the IRC, we constantly check with staff from our respective country offices or staff with context-specific knowledge when we see something in the data that we don’t quite understand.
This same process applies for how we decide what to collect in the first place and how to analyze our data. First, we need to make sure we are even asking the right questions and collecting the right variables. All too often, when we do not measure meaningful identities, like head of household status, or when we measure them incorrectly, it can influence how we interpret findings where we might land on the wrong interpretation. Sometimes quantitative data analysts only plan to calculate treatment effects or examine descriptive statistics for a full sample, but this hides any differential effects or patterns for subgroups by assuming everyone comes from the same “population” with similar experiences (for more on bias in model construction, refer to Cathy O’Neil’s Weapons of Math Destruction). Instead, we should collect and disaggregate our data by important and relevant subgroups such as gender, race, residency status, and disability status. Previous IRC research examining the impact of adding gender dialogues to Village Savings and Loans Associations (VLSAs) in the Ivory Coast estimated treatment effects by child marriage status on intimate partner violence. Although the team found no impact on intimate partner violence for women married as child brides, they detected magnified effects for women married as adults (see Falb et al., 2018).
So next time you come across something odd in your data, think about your experiences and ignorance and reach out to a colleague who can look things over. Conversely, lend your insights to others’ research and help them work out their data. If you’re in a hiring position, think about the lived experiences of your team and what’s missing. It’s important that the people analyzing data reflect the people who donate their data. We need people of different races, ethnicities, classes, genders, disabilities, religions, ages, and sexual orientations to analyze data. We need to stop pretending we are objective actors. As a field, we are behind our qualitative peers in understanding the role we play in our research (read Multerud, 2001 for more about understanding and evaluating researchers’ objectivity and how worldviews influence research).
We are people, therefore we are subjective in everything we do, including data analysis. Our positions as quantitative data analysts or researchers do not exempt us. Bringing a wider variety of experiences to the table is the only way to counteract the biases we hold and accurately analyze and represent the data we work with.











