Gender Bias on Wikipedia: A pie chart showing that 7% of links from a set of 26 pages go to women’s biographies.

Wikipedia: Gender Bias

Erasing Her from HiStory

What does gender bias on Wikipedia pages look like? How quickly does it change?

OpenSexism
10 min readAug 5, 2022

--

Throughout history, women have been ignored and erased, their contributions unacknowledged or simply forgotten. Erasure leaves gaps in our understanding of the world, and our failure to acknowledge women’s work impedes careers.

We often speak of ‘systemic bias’ as if it were a dark thundercloud under which we must sit, but we must also acknowledge that our technologies shape this experience. Under a dark cloud, we are drenched by rain, but we also created the umbrella. In fact, per Wikipedia, a woman, ‘Lu Ban’s wife,’ is credited for inventing this marvelous device.

Lu Ban’s wife is not named on her husband’s Wikipedia page. There is no link to a page about her life and work. Had there been one, I would have clicked. Wikipedia is famous for linking related topics, link after link, a ‘rabbit hole’ we can go down and discover… well, based on what I’ve observed: men.

Measuring the gap

For the past nine months, every Wednesday, I’ve measured the gender diversity of the links to biographies on a set of 26 Wikipedia pages–from Reality to Universe, Science to Justice. This set of pages currently receives over 1.3 million views a month and each appears on the first page of Google search results (several, including Mathematics and Ornithology as the top result for the term). As of July 27, 2022, the share of links to women’s biographies from these pages stands at 7 percent.

To make this imbalance visible, I used an ingenious tool that identifies links to Wikipedia biographies, looks up the gender for each mentioned person, and returns tallies. Of the tool, its creator PAC2 writes:

“This simple quantitative approach to measure gender diversity is similar to many research projects on this theme in computational social sciences. David Doukhan is tracking women’s speaking time on the radio[2]. Antoine Mazières and his co-authors are computing the share of screen time with women in popular movies[3] and Gilles Bastin and his co-authors are computing gender frequency of people cited in French newspapers[4].”

PAC2 concludes: “I believe that measuring helps to raise awareness of the problem of gender diversity in Wikipedia articles.” I agree. A problem that is not quantified and made visible is impossible to solve. And the data I’ve collected each Wednesday indicate that this problem is not going away. Longitudinal data clearly reveal the inertia perpetuating a very deep structural bias, one that renders women and their work invisible.

Who’s linked and who’s visible?

“If you start at any given article on Wikipedia, you’re much less likely to eventually reach an article about a woman artist than you are about a male artist — and this was true for women across the board,” Isabelle Langrock, who studies feminist interventions in Wikipedia, has said. A focus on creating women’s biographies does not alone make those biographies visible.

Christoph Hube, et al, who looked at how writers are represented on Wikipedia, found that only two of the over fifty writers with at least 60 in-links from other writers are women (Jane Austen and Virginia Woolf).

“Hyperlinking policies assume that relevant information will be cited and linked regardless of gender. They do not take into account that studies show men cite men more often than they cite women, and men dominate the Wikipedia editing space,” writes Colleen Hartung. Not coincidentally, Brendan Luyt observed that reference lists often give precedence to “the feats of great men.”

“Even if the editorial community is substantially altered, if bibliographical imagination does not expand as well, issues of representation will likely remain unresolved,” Luyt writes.

When I looked at who’s linked, I found that the share of women is disturbingly low, and that only a combination of visibility and deliberate, targeted intervention made a difference:

The Wednesday Index: Number of linked mentions on a set of 26 Wikipedia pages by gender over time. Plots for each individual page are included at the end of this piece along with data
The Wednesday Index: Number of linked mentions by gender over time. Plots for each individual page are included at the end of this piece.

Addressing the gap

The small but noticeable uptick of links to women in July of 2022 reflects the work of David Palfrey, who chose to tackle the gap on the Wednesday Index’s 26 pages to make the difficulties of addressing the imbalance more visible. Were it not for his intervention, the lines may well have remained flat.

Palfrey, active on wikipedia as the editor Dsp13, highlights three conditions contributing to the extreme underrepresentation of women in the network of linked (and subsequently more visible biographies): women with Wikipedia pages are mentioned but not linked, women mentioned do not have a Wikipedia page and therefore cannot be linked, and women are not mentioned/cited, just in general. For example, Samuel Baltz, who’s studied Wikipedia’s coverage of political scientists, notes that the first authors in the reference and suggested-reading sections for the Political Science page include 26 men and one woman, even though thirty percent of political scientists are women.

Palfrey suggests creating automated tools to help identify underlinked authors, as well as a talk-page template to report a page’s citation gender stats, with a call to address imbalance. I agree, and believe that statistics capturing the underrepresentation and erasure of women also belong on the front of each page, alongside the suggested indicators that “help readers more quickly and accurately assess the quality of content.” A public-facing indicator will also make the problem more visible to subject matter experts, who are needed to identify and include missing sources.

Currently, Palfrey is working with Women in Red (a WikiProject that has been instrumental in increasing the share of women’s biographies from 15.53% when the project started in 2014, to 19.30% today) to create Wikipedia pages for women whose work is highly cited. New York Times journalist Jennifer Schuessler has been cited on Wikipedia over 400 times, for example, but she does not have her own page.

These new pages help address gender bias on two dimensions–filling a content gender gap (usually measured as the share of women’s biographies) and a structural gap (the share of links to women’s biographies, currently not measured/tracked).

Why it matters

“Admittedly, article selection is a crucial process to define what is and what is not illuminated. However, it is only the beginning of a series of processes that make each article or collection of articles more or less visible,” write Pablo Beytía and Claudia Wagner, who do an excellent job breaking down the types of bias that affect women’s visibility on Wikipedia. What is exciting to me about PAC2’s tool is that it gives insight into dimensions of bias that are currently obscured because there is not yet a systematic way to measure and track them.

Though gaps other than the underrepresentation of women’s biographies on Wikipedia — the top domain in search results–receive relatively little attention, they are critical, and amplified. When a reference to a scientific article is added to Wikipedia, for example, the visibility afforded generates more citations for that article, and any bias against a particular demographic is compounded.

“Inequalities within the structural properties of Wikipedia — the infobox and the hyperlink network — can have profound effects beyond the platform,” write Langrock and Gonzalez-Bailon. Gendered inequities “can have large effects for information-seeking behavior across a range of digital platforms and devices.”

Jere Odell, Mairelys Lemus-Rojas, and Lucille Brys, who have looked at the gender equity in publishing and wikidata highlight other harms: “When women authors are excluded from authorship and from the reference lists of subsequent works, their contributions to scholarly communication become less visible and less likely to be recognized and rewarded.”

Visibility and tools

Increasingly, writers, editors, and readers are recognizing these harms and working to better understand demographic diversity, to identify and communicate gaps, mitigate bias, and publish their progress. Dani Bassett and their team, who’ve studied citation inequities in neuroscience, have written tools to help authors examine inequities in their own reference lists. PAC2’s tool is useful for making Wikipedia’s link inequities visible; a tool like the one Bassett created could help make Wikipedia’s larger citation inequities visible, and galvanize change.

“Ultimately, we want to create a conversation that encourages people to actually stop and think about their citational politics, the choices that they’re making and the consequences of those choices.” Diana Kwon writes in “The rise of citational justice: how scholars are making references fairer.” The conversation is an important one.

Tools that help make inequities visible, that raise awareness of imbalances and erasures, and help contributors address these problems are urgently needed. These tools belong in Wikipedia’s reading and editing interfaces — where we can, very literally, change the system to better facilitate change. Indeed, only when the systems we build are aligned with individuals’ interventions will we cease perpetuating (and amplifying) the bias in a status quo that society is actively working to change.

Many thanks to David Palfrey for the comments and suggestions that improved this piece.

More on Wikipedia and gender bias:

Works Cited

Asr, Fatemeh Torabi, Mohammad Mazraeh, Alexandre Lopes, Vasundhara Gautam, Junette Gonzales, Prashanth Rao, and Maite Taboada. “The gender gap tracker: Using natural language processing to measure gender bias in media.” PloS one 16, no. 1 (2021): e0245533.

Baltz, Samuel. “Reducing Bias in Wikipedia’s Coverage of Political Scientists.” PS: Political Science & Politics 55, no. 2 (2022): 439–444.

Beytía Reyes, Pablo, and Claudia Wagner. “Visibility layers: A framework for systematising the gender gap in Wikipedia content.” Internet Policy Review 11, no. 1 (2022): 1–22.

Dworkin, Jordan D., Kristin A. Linn, Erin G. Teich, Perry Zurn, Russell T. Shinohara, and Danielle S. Bassett. “The extent and drivers of gender imbalance in neuroscience reference lists.” Nature neuroscience 23, no. 8 (2020): 918–926.

“eLife Latest: July 2022 update on our actions to promote equity, diversity and inclusion.” Inside eLife. (2022).

Hartung, Colleen. Challenging bias against women academics in religion. Vol. 2. Atla Open Press, 2021.

Hu, Jane. “The Overwhelming Gender Bias in ‘New York Times Book Reviews’.’ Pacific Standard. (2017).

Hube, Christoph, Frank Fischer, Robert Jäschke, Gerhard Lauer, and Mads Rosendahl Thomsen. “World literature according to wikipedia: Introduction to a dbpedia-based framework.” arXiv preprint arXiv:1701.00991 (2017).

Kuznetsov, Andrew, Margeigh Novotny, Jessica Klein, Diego Saez-Trumper, and Aniket Kittur. “Templates and Trust-o-meters: Towards a widely deployable indicator of trust in Wikipedia.” In CHI Conference on Human Factors in Computing Systems, pp. 1–17. 2022.

Kwon, Diana. “The rise of citational justice: how scholars are making references fairer.” Nature 603, no. 7902 (2022): 568–571.

Ladyzhensky, Alina. “Bridging Wikipedia’s Gender Gap, One Article at a Time.” Annenberg School for Communication (2022).

Lafrance, Adrienne. “I analyzed a year of my reporting for gender bias (again).” The Atlantic (2016).

Langrock, Isabelle, and Sandra González-Bailón. “The Gender Divide in Wikipedia: Quantifying and Assessing the Impact of Two Feminist Interventions.” Journal of Communication 72, no. 3 (2022): 297–321.

Larivière, Vincent, Chaoqun Ni, Yves Gingras, Blaise Cronin, and Cassidy R. Sugimoto. “Bibliometrics: Global gender disparities in science.” Nature 504, no. 7479 (2013): 211–213.

Luyt, Brendan. “The inclusivity of Wikipedia and the drawing of expert boundaries: An examination of talk pages and reference lists.” Journal of the American Society for Information Science and Technology 63, no. 9 (2012): 1868–1878.

Luyt, Brendan. “Representation and the problem of bibliographic imagination on Wikipedia.” Journal of Documentation (2021).

Odell, Jere D., Mairelys Lemus-Rojas, and Lucille Brys. “Wikidata for Scholarly Communication Librarianship.” (2022).

PAC2, “Measuring gender diversity in Wikipedia articles.” The Signpost (2022).https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2022-05-29/In_focus

PAC2, Gender diversity tool: https://www.wikidata.org/wiki/User:PAC2/Gender_diversity

Palfrey, David. “Gendered link bias” en.wikipedia.org/wiki/User:Dsp13/Gendered_link_bias

Royal Society of Chemistry. “Academic publishers collaborating in fight against bias announce key action on diversity data collection” (2022).

Science News Reckoning Team. “Some past Science News coverage was racist and sexist. We’re deeply sorry.” Science News (2022).

Temple, Emily. ‘It Isn’t Rocket Science’: ‘Tin House’ and ‘Granta’ Editors on How to Run a Publication That Isn’t Sexist.’ Flavorwire (2013).

Thompson, Neil, and Douglas Hanley. “Science is shaped by wikipedia: Evidence from a randomized control trial.” (2018).

VIDA Count https://www.vidaweb.org/the-count/

Yagci, Nurce, Sebastian Sünkler, Helena Häußler, and Dirk Lewandowski. “A Comparison of Source Distribution and Result Overlap in Web Search Engines.” arXiv preprint arXiv:2207.07330 (2022).

Yong, Ed. “I spent two years trying to fix the gender imbalance in my stories.” The Atlantic 6 (2018).

THE WEDNESDAY INDEX, Individual pages

“Adult”
“Architecture“
“Art”
“Beauty”
“Government”
“Human body”
“Justice”
“Knowledge”
“Language”
“Life”
“Light”
“Mathematics”
“Medicine”
“Ornithology“
”Peace“
“Philosophy”
”Physics“
“Poetry”
”Political history of the world”
”Rainbow”
”Reality”
“Science”
”Society”
“Theatre”
“Time”
“Universe”

--

--