Thinking like a Data Feminist

Katherine Beame
IHME Tech
Published in
11 min readFeb 18, 2021
Data visualization displaying wage gab between gender and race/ethnicity by Mona Chalabi (https://monachalabi.com/)

Data Feminism, by Catherine D’Ignazio and Lauren F. Klein, is a revelatory book that I read with a group of my co-workers in 2019. It is one of the first books that brought my attention to the higher concepts of data science and data visualization. Over the past two years, the book has made me think more critically and idealistically about both the works of data science that I see and work on.

Data Feminism is both the book name and concept based around an understanding of feminism that goes beyond sex inequality. This definition of feminism states that the power in our society is not equally distributed among many delineating groups (sex, race, ethnicity, disability, gender and sexual identity, socio-economic background to name a few) and that those power inequalities create oppressive systems that harm us all. Data Feminism takes this definition of feminism and brings it to the business of data science. Data feminism calls upon those in data science to account for the unequal power structures that we live in when collecting data and creating data visualizations. We must call out those inequalities in our work. The aim is co-liberation, which is achieved by challenging and changing the existing structures of power by calling out power inequalities in our work.

I read Data Feminism as a part of a series of readings on data storytelling basic principles and higher concepts with my team’s affinity group. It was jarring to go from the other data visualization books that largely dealt with aesthetics to a book about how social power structures’ influence data science. Each data visualization concept has its place in learning. Other concepts taught me to beautifully present data, but Data Feminism gave me the gift of understanding that just presenting data aesthetically is not enough. The current systems and best practices of data science and visualization do not take underlying social structures into account.

There are seven fairly interconnected principles of Data Feminism that help to inform active ways to restructure how you think about data science.

1. Examine power

Consider what data you should be studying and portraying, as well as how to portray it. Data Feminism compels you to look at your work through the lens of power inequalities and ask yourself “Why is the data science being done? For whose interests and goal?”

2. Challenge power

Challenge the structures of power in society with your work. Create work that highlights issues for groups that hold less power in society and critique the power structures as they currently stand. Ensure no unintentional harm is inflicted by your outcomes and practices. Ask yourself, “Am I reinforcing existing power dynamics? Does my work empower under represented groups?”

3. Elevate emotion and embodiment.

Acknowledge that humans are behind datasets and that creations based on that data will impact those humans in unequal ways. Emotive representations of those human filled datasets help to acknowledge that data is more than just numbers. Emotion and empathy have an important place in data science. Ask yourself, “Does my work obscure or elevate the human side of the data?”

4. Rethink binaries and hierarchies

Current structures of society and the inherent fact that binaries are often used in data collection means that many groups are not visible in data because they are not counted. Rethink how data collection occurs and how data is displayed while taking a group’s consent, personal safety, cultural dignity, and historical context into consideration. Ask yourself, “Who is missing from this data and why?”

5. Embrace pluralism.

The best and most complete knowledge comes from bringing in as many perspectives as possible. Center the perspectives of those whom the data is about, those that are local, and those whose opinions society does not already elevate. Ask yourself, “Whose input is going into the decisions of how the data is being used and displayed?”

6. Consider context

Data itself is collected by humans under certain conditions and thus is not neutral or objective. To uphold the principles of ‘unbiased data’, data science must understand that there is no such thing as unbiased creation. Data science should instead be clear about bias and use the context of the work to highlight impacts of societal power structure on it. It is only through this contextualizing that data can truly aim to be ethical or accurate. Ask yourself, “Does my work adequately communicate the context in which the data was collected and is being presented?”

7. Make labor visible

Just like we aim to illustrate the power imbalances in our society at large we must think about who is credited in our work. The work of many individuals integral to data science, from data collection to tabulation, are obscured and undervalued. Ask yourself, “Is there unacknowledged work that I am not crediting?”

Data Feminism in practice:

I came into software development and data science as a fairly new entrant in 2016 when I started working as a software developer at the University of Washington’s Institute for Health Metrics and Evaluation on their data visualization team. Part of my excitement was to join an organization whose mission is to save lives and reduce disease by elucidating global health problems. My team’s contribution to that mission is to disseminate data visualizations that inform policy makers about the health status of their constituents. Our goals are in line with data feminist principles, but we have a long way to go to fully employ all of the concepts of Data Feminism into our work. There are a number of ways in which we have started to implement the 7 principles and many where we are in need of improvement.


1. Examine power

We have historically worked within our organization to output large and medium size exploratory tools that showcase the breadth of outcomes that our research teams have both accumulated and created. These tools have a lot to offer in allowing those outside of the organization to view health data without having to read lengthy papers.

We have made a dedicated effort to prioritize accessibility in tools, whether that is color-blind accessibility or checking and maintaining our tools for easier screen-reader use. For data visualizations, this process is by no means perfect and many of our tools still do not meet our accessibility goals. Making accessibility a part of our workflow will allow our future tools to become more and more accessible.

2. Challenge power

We, as a visualization team, and as an organization try to be ‘impartial’. We want our tools and data to do the most good, and for that reason we try not to antagonize countries or organizations by displaying data that is prescriptive. It is disingenuous to say that we have no bias or are completely impartial or that the world is equal and thus all results have the same consequences for everyone. At times, this has led us to cut controversial results or have language within our tools that is intentionally vague.

3. Elevate emotion and embodiment.

As a team, a big part of what we have focused on is making our tool more usable and emotive. While exploratory tools have many benefits, they have a significant downside of being more difficult to understand and state data as fact without contextualizing it within society or our organization’s biases. Rather than solely focusing on tools that show the breadth of research outcomes, we have tried to create more contained, story-based visualizations.
These tools, like the Child Mortality tool, do not simply state facts, but through images and language truly try to instill the data with the gravity of the experiences of the children represented.

IHME’s Child mortality tool (https://vizhub.healthdata.org/child-mortality) uses the pictograph of a child to illustrate child deaths rather than an impersonal bar chart.

There are regulations on what data detail level we can display. Even with our tools that display the most granular data we have privacy limits for how much detail we can display. This is to protect the privacy of individual health records, but also because we want to provide local-level information without burdening or endangering individuals.

Local burden of disease: HIV tool (https://vizhub.healthdata.org/lbd/hiv). Geospatial display of hovered data point in Sudan showing years of prevalence of HIV.

4. Rethink binaries and hierarchies

Because our team does not do the research or decide research topics, we do not necessarily have a lot of control over what dimensions of the data are present in the associated visualizations. We are not a part of the data collection process, nor do we make the hierarchies and categories that make up the datasets. However, as the research teams continue to expand, and we as an institute are able to take in more information, we are adding more dimensions to our data and thus able to reveal more important parts of societal inequality in our visualizations. For example, we have been adding useful granularity by mapping health at the scale of local governments, and are working to separate health data across racial and ethnic groups. With the inclusion of this data, we have a greater responsibility to think critically about how we are displaying the data. We must provide context for why the categories that are present exist within our datasets because those categorizations may have consequences for individuals within categories and those excluded or mis-represented by our categorizations.

5. Embrace pluralism.

Throughout my time on the data visualization team, I have found that each new team-member brings a new and interesting perspective on how we can do our work better. Through this we have made not just technical improvements to our codebases and our software stacks but also design, process, and vision improvements.

We now are developing a process for running certain tools though a human centered design process that incorporates feedback from individuals outside our team and organization at many different points during the creation process.

6. Consider context

While not a new side to our work we have always tried to emphasize the uncertainty inherent in the data results that we display. While displaying uncertainty is notoriously difficult in data science, it has always been important that we are transparent about our data modeling processes and recognize the inherent uncertainties in our final results.

Covid-19 tool showing Daily infections and testing line graph with uncertainty bands. (https://covid19.healthdata.org/global?view=infections-testing&tab=trend&test=infections)

7. Make labor visible

To try to further elucidate how the research teams go through the process of data creation, some of our tools contain flow charts and text explaining our results. Our COVID-19 tool links to updated and detailed descriptions of our models and how they have changed over time.

Covid-19 tool’s header with links to portions on healthdata.org/covid/ that discuss FAQ, Publications etc.
http://www.healthdata.org/covid/

Our mortality tool has a rather complicated description in the tool itself of how the data is procured and processed to get the results. The flowchart comes up even before you are able view the results of the process. A Frequently Asked Questions page within the tool helps to further further untangle the complicated data.

Mortality tool flowchart describing the data creation process on the left and mortality tool FAQ on the right. (https://vizhub.healthdata.org/mortality/)

The process of incorporating Data Feminism principles into our work is not as straight forward as other display or design principles that have rules, but the 7 principles of Data Feminism give us the questions we need to ask ourselves when starting each new project. The decisions that need to be made about how to display data is not cut and dry. Context, societal power structures, our own biases, privacy: these all contribute to a complicated world, and no visualization or organization will flawlessly incorporate and balance all of these forces. Data Feminism is about striving for an idealistic goal while understanding you cannot perfectly attain it because you are an individual formed by a world that is full of inequality and bias.

Commitments:

Here are some specific ways that I want to integrate data feminist principles into my work. These are going to be imperfect commitments because I, and my implementation process, is a work in progress. Going through each of the Data Feminist principles and creating a list of commitments has been helpful for to me, and one that I would recommend in your own work.

  • Embrace pluralism: Work with hiring and outreach, trying to get more applications for positions from a wider candidate pool and examine my own biases towards particular candidates.
  • Embrace pluralism: Try to integrate a more collaborative work effort into inter-organizational research tools. The first step to integrating more voices into our tool design and creation process is to embrace the perspectives of those within our organization. We have a number of data visualization tools that are primarily used by our internal research teams. I would like to increase the amount of input that individuals on the research team have.
  • Embrace pluralism: Try to integrate a more collaborative work effort into public facing visualizations. The institute has cultivated a global collaborator network with over 6,000 members from 154 counties that are the institute’s partners in research. Integrating the input and feedback from this group of people into our tools is a huge underutilized resource for pluralism.
  • Design emotive tools: Using the data IHME makes public to create my own, simpler and easier to understand data visualizations that point to societal inequities. During development, I will focus on understanding who I am making the visualization for, who it will effect, and whether or not the effect will be positive.
  • Challenge power: Investigate ways to have less of an all-knowing, “god-like”, too impartial presence in our exploratory visualization tools. This could be implemented by displaying uncertainty in the data, adding explanatory text that allows for various interpretations of the data, or other design details.
  • Make labor visible: Strategize about what the hidden parts of labor are in my work at my organization. To be able to fulfill this commitment try to better understand how our research studies are selected and realized — from the inception of the idea, to how data dimensions are defined, to how data is processed, collated, and ultimately interpreted.”
  • Accountability: Check in with my process and commitments on a regular basis. Check myself on how well I am doing on a monthly basis and record the issues that come up.

What is my context?
I am a cis, hetero, college-educated, white woman who grew up in Seattle and has been a software engineer at the Institute for Health Metrics and Evaluation for the past 4 years. My life has been predominantly healthy and I have no known disabilities. I had a financially stable upbringing with a lot of privileges that money can afford like private school, university, study abroad and varied travel and cultural opportunities. The organization where I work supports my self-learning and I feel my opinions are listened to on my team. I am writing this article to try to articulate the different impacts that Data Feminism and the concepts within have had on how I view my role and how I try to create data visualizations.

Sources:

Catherine D’Ignazio, and Lauren F. Klein. Data Feminism. The MIT Press, 2020.

Equal Pay Day: a wage gap fact check. M. Chalabi. The Guardian 2018.

--

--