Recording Gender: An Ethical Cataloging Conundrum

16 min readNov 18, 2022

From a talk for a panel, “Authors, Authority, and Identity: Facilitating Self-Identification for Discovery and Inclusive Knowledge Production,” given for Harvard University Libraries on November 3, 2021. [Link to recording]

Abstract

Over the past decade, Resource Description and Access (RDA) introduced new cataloging instructions, models, and goals to the library profession, leading catalogers to record gender and gendered information about creators and contributors in library metadata. But for what purpose? And at what risk, and to whom? This talk will review the history of this practice and its use, discuss the risks and potential harm caused by recording gender in library metadata, as well as explore possible solutions through which to facilitate and promote diversity and inclusion in library collections through metadata.

Introduction

Thank you for the opportunity to speak about a topic that for nearly a decade I have advocated for examination from within the field: how and why catalogers record an author or contributor’s gender in their name authority record. I am going to use my time today to look at the history of this practice and its use, discuss the risks and potential harm caused by recording gender in library metadata, as well as explore possible solutions through which to facilitate and promote diversity and inclusion in library collections through metadata.

For most of my career, I have kept one foot planted in practical library cataloging, and the other foot planted in the future of library metadata. I was trained as an AACR2 and DCRM(B) cataloger, but became an RDA and then BIBFRAME trainer. I catalog rare books, and have designed ontologies. I have been a BIBCO cataloger, and continue to be a NACO contributor. I have participated in test pilot projects through the PCC Task Group on Entity Management in NACO, and I have worked to make ethical and inclusive recommendations for catalogers to record gender in name authority records. Now I am at Bard College as the Systems and Metadata Librarian where my feet have been brought back to the ground of cataloging reality. Bard likes to say it is “a place to think,” so today I am presenting a think piece for you. This will be a summation of nearly a decade of research, work, and advocacy with the intention of creating a more inclusive catalog that respects the individual human dignity of the authors whose works we collect in our libraries. It will also be my thoughts for a path forward as we consider how to adopt and implement the latest version of RDA. I will explore what I’m calling an ethical cataloging conundrum: how do we balance the necessary neutrality of the cataloging project, with the unavoidable bias of the individual librarian? How do we facilitate discovery and exploration of the library collection, while rectifying the systemic oppression within our cataloging standards? I choose to focus on describing gender because it is what I have written the most about and where my work has made the most impact so far. But my work is far from done, and there is a need to continue educating and advocating for change, and training on best practices. Also, if I were to try to tackle all of the systemic oppression in library cataloging standards, I would need an entire course! Far more than this 25 minute talk can cover, and so this talk will be centered on how catalogers record gender and gendered information in authority data and increasingly in bibliographic data as well.

Recording Gender in Library Data

How the hell did we get here?

In the decade that I have been a NACO cataloger, not once has an author actually wanted me to include their gender in their Name Authority Record if I directly asked them. Yet as catalogers who participate in the NACO program through the Program for Cooperative Cataloging (PCC), we are instructed to record as much information we can into an author’s record adhering to RDA and MARC. We are encouraged to create little biographical sketches, so that we can achieve the User Tasks outlined in the International Federation of Library Association’s Library Reference Model (LRM): Find — Identify — Select — Obtain — Explore.

The Functional Requirements for Bibliographic Records (FRBR) was published in 1998; the Functional Requirements for Authority Data (FRAD) was published in 2009; and the Functional Requirements for Subject Authority Data (FRSAD) was published in 2010. The “functional requirements” family is not necessarily made up of rules, but rather recommends data models that introduce entity-relationship modeling theory for information-systems design to library metadata. The Anglo American Cataloging Rules (AACR2) remained the primary cataloging code until 2010, when it was replaced by RDA. RDA follows the recommendations and entity relationship models outlined in the FRBR family. In 2017, IFLA published the LRM, an approach that unifies and reconciles FRBR, FRAD, and FRSAD into a single model. RDA and its 3R Project aim to update the standard to adhere to this new IFLA model. We’ll see how that goes ;-)

Prior to RDA, the primary goal of creating authority records was for the identification and disambiguation of authors to collocate their works under a single “heading” or “authorized access point.” An authority record typically consisted of the preferred name (the heading in the 1XX), variant forms of the name (tracing fields in the 4XX & 5XX), related works, and notes on sources where the data was found (recorded often in multiple 670 fields). Biographical or contextual information was not typically recorded until 2001 when the MARC Authority 678 tag was introduced, although I have found that catalogers will still get very creative with their 670 fields and record extensive biographical and personal information (about gender in particular) to justify a name change for an author. Since the introduction of FRAD (now LRM) and its codification in RDA, catalogers are now given formal elements with which to record all kinds of Personally Identifying Information (PII). These are:

Name of the person
Date associated with the person
Title of the person
Fuller form of name
Other designation associated with the person
Gender
Place of birth
Place of death
Country associated with the person
Place of residence, etc.
Address of person
Affiliation
Language of person
Field of Activity of the person
Profession or occupation
Biographical Information
Identifier of Person

It is still not entirely clear how all the new elements were determined for FRAD and therefore RDA. The only insight into this is from a paper by Gorden Dunsire published in 2018 where he writes, “Why do content standards cover gender in the first place? RDA does it because FRAD did it; FRAD did it because established name authority control systems do it.” Which essentially makes the case of “if it ain’t broke don’t fix it,” and make it part of international cataloging rules! What harm could there be?

Recording Gender in Practice

As was seen in early LC training for NACO RDA Authority Records, catalogers were coached to record gender and to make a “safe assumption” based on the author’s name and biographical information. And although RDA does not ask for or require dates associated with gender, MARC permits it and catalogers were initially encouraged to record date ranges for gender as well. Once RDA was fully adopted and implemented by the cataloging community, gender information was added to new and existing NARs in the LC Name Authority File. Catalogers added gender for long deceased authors, to fictitious characters, and created extremely detailed gender information for trans* authors. There became an almost compulsive desire to complete all the fields in a record. But as I’ve said before — just because you can, doesn’t mean that you should.

The early RDA rules restricted catalogers to the RDA VES for Gender which only included three terms: Male, Female, and Unknown. I submitted an RDA FastTrack proposal at the CC:DA meeting during ALA Annual 2015 to add the term “Transgender” to the VES for Gender in RDA. That proposal was ultimately rejected, and replaced with another proposal by the Joint Steering Committee of RDA (now the RDA Steering Committee or RSC) to completely deprecate the VES for Gender in the content standard. That proposal was accepted in February of 2016. The current instruction for Gender in RDA states:

Gender is the gender with which a person identifies.

Take information on gender from any source.

Record the gender of the person, using an appropriate term in a language preferred by the agency creating the data. Select a term from a standard list, if available.

With this new instruction, PCC catalogers needed new guidelines on how to apply it, so PCC formed the Ad Hoc Task Group on Gender in Name Authority Records. I chaired this group with my esteemed colleagues Matthew Haugen, John Hostage, Nancy Sack and Adam Schiff. We submitted our report and set of recommendations to the PCC on October 4, 2016. The task group made recommendations for terminology to use when recording gender in name authority records, as well as encouraged catalogers to be more inclusive of gender diversity in the metadata, emphasizing self-identification and privacy when describing persons. At first our recommendations were well received, but PCC leadership wanted membership to have an opportunity to provide feedback so we issued a survey. The results of the survey were overwhelmingly positive as well, with a few insightful critiques specifically around issues of internationalization and gendered languages.

However, our recommendations have yet to be fully adopted and implemented. They have been enacted as best-practice by the wider-cataloging community, but not codified in LC’s DCM-Z1 or in RDA as a Policy Statement since our recommendations were predicated on the approval of some new terms to the LC Demographic Group Terms thesaurus that we submitted in November and December of 2017. We are still waiting on those proposals to be approved.

Since the original report was published, language for gender has evolved extensively and additional controlled vocabularies have been approved to use in MARC, providing more options to accurately describe gender diversity. PCC also adopted new and bold strategic directions that center diversity and inclusion. As a result a new Advisory Committee on DEI was formed. This new committee has revived the Ad Hoc Task Group on Gender in NARs and charged it with creating an updated report. I am again chairing this group with a few new members to add to our expertise and lived experience. We hope to have a new report finished by ALA Midwinter 2022. Stay tuned.

Use Cases for Recording Gender

But why record gender? For what use? The uses cases or scenarios touted as rationale for recording gender in NARs fall into two broad areas of the LRM User Tasks:

Find & Explore

These are mainly patron-side concerns and deal with identification and discovery. A user should be able to distinguish among persons of the same or similar name, or identify persons known by phrases, pseudonyms, initialisms, ambiguous names, or unfamiliar names. Searching could also be limiting to a particular gender (e.g. female composers) therefore enhancing exploration and discovery.

A user can distinguish among persons of the same name within the context of their works. If one Jane Smith writes about horses and the other writes about chemistry, and I know that I am interested in horses — I can confidently deduce and find the Jane Smith that writes on that equestrian topic. I do not need to know her gender to do this.

As for limiting searches by the gender of the author, while that sounds like a very promising idea — we are far from that reality with our current MARC-based library catalog systems. They simply don’t use authority data in that way, so much so that catalogers now add gendered information into bibliographic records in the 386 field for Creator/Contributor Characteristics. I will reiterate, just because we can doesn’t mean we should. I don’t believe that this is the role of the library catalog. There is not enough data in bibliographic or authority to make this kind of searching precise. PCC catalogers have only been recording gender for about a decade, and we have decades worth of hundreds of thousands of NARs already created that will need to be updated to include gender information for that kind of searching to ever be accurate. Besides, while it would be pretty cool to find all the female poets from the 19th century in a library’s collection, it would be even cooler to find all the Black trans* poets from the 20th century. But we don’t record race or ethnicity in a consistent way, so the catalog fails here. Wanting a library catalog to do this kind of searching based on authority or bibliographic data is an impossible project. Let’s leave this kind of work to the encyclopedias and bibliographies. An alternative, however, could be to utilize linked data from sources such as Wikidata. Wikidata provides an open platform where anyone can contribute and edit information about anything, and it’s proving to be far more precise, flexible, and accessible than the LC Name Authority File. The goal of the catalog is different from that of Wikipedia and Wikidata, but I hope that one day we can utilize linked data to inform the catalog and do the kind of sophisticated searches that would make librarian dreams come true.

Identify & Select

These are primarily cataloger concerns and deal with disambiguation and accuracy. The argument is given that recording gender will help disambiguate authors with the same name, and identify persons known by phrases, pseudonyms, initialisms, ambiguous names, or unfamiliar names for the cataloger. The gender element provides contextual information that will facilitate selecting the correct authorized access point to record in a bibliographic record.

Catalogers are smart. They don’t need to know the gender of the author to identify and select the correct name for their cataloging. The argument has been given that gender is helpful for non-roman character transliteration. The PCC Ad Hoc Task Group on Gender in NARs interviewed CJK catalogers, and found that they really don’t use the gender element in this way, and that they are able to disambiguate authors of the same name primarily through other elements in the authority record and their works. So we don’t need gender for this reason.

Risks, Harm, and Catalog’s Role

What risk or harm could result from recording gender in name authority records? And what role does the library catalog have with this kind of personal information?

PII

Many of the RDA elements for persons are considered Personally Identifiable Information (PII) according to the National Institute of Standards and Technology. These are: Fuller Form of Name, Address of Person, Date Associated with the Person, and Place of Birth. To a lesser degree, but still potentially PII include: Name of the Person, Gender, and Affiliation (specifically if race or ethnicity is recorded, or their school or workplace). Once this data is published as open linked data, we can’t control where that data goes or what is done with it. So catalogers have to think very carefully before we record something about a person. Before saving or updating a record in the LC NAF, we should all think to ourselves — is there anything in this record that could be risky or potentially harmful to this person if it is publicly available?

This is a great opportunity to communicate directly with authors to get their consent about what information goes into their authority record. Most authors that I have reached out to via email are excited and delighted at the opportunity to work with librarians to make sure their authority record accurately and respectfully reflects their identities. It’s a wonderful opportunity for outreach and collaboration. In a small library like Bard this addition into the Name Authority workflow isn’t too onerous, but I could see at larger institutions this not scaling with the needs of production. But then you have to ask, what is important — production or respecting author’s identities? I think it’s well worth it to always choose respect and redesign the workflow to accommodate accordingly.

Concerning Trans*

Gender identity, the vocabulary used to describe it, and the degree to which individuals are able to and choose to disclose it, is complex, contextual, personal, and subject to change over time and in different environments and jurisdictions. Trans* and non-binary individuals in particular are more likely to experience negative consequences (such as discrimination, psychological harm, and violence) as a result of of being misgendered (when a gender identity is incorrectly imposed on them by someone else), “outed” (when their gender identity is disclosed by someone else without their consent), or “deadnamed” (when a given or birth name is disclosed or used without their consent), whether intentionally or not. As library data is increasingly opened and repurposed, there could be additional unforeseen and potentially irreversible ramifications for specifically recording gender information.

We should treat trans* authors with the same respect that we treat all authors who simply change their name at some point for whatever reason. There is no need to go into the how and why the name change was necessary. Just yesterday I had the opportunity to update the authority record for Bard’s own Lucy Sante who recently announced her transition and changed her name. I was able to cite her Wikipedia article as the source, and simply update her name.

The Role of the Catalog

I maintain a conservative view on the role of the library catalog and the function of the data held within it. Bibliographic data is essentially an inventory of a library’s collection. Authority data is for disambiguation and collocation. It is not the role of the cataloger to determine and record Personally Identifiable Information (PII) in authority and bibliographic data. Until we have open tools that allow for equitable accessibility and transparency of metadata — specifically meaning that we no longer use OCLC Connexion to create NARs and maintain the NAF and we move to Wikidata — I believe we should return to an AACR2 style of authority control. We need to ask ourselves what is the function of the NAF and why are we trying to replicate what is already being done in Wikidata?

Possible Solutions

There are two possible solutions to this cataloging conundrum. We can continue to record gender but do so as ethically and respectfully as possible, or we don’t record gender at all.

Record gender ethically and respectfully

This will essentially be a continuation of what we’ve been doing since the 2016 report was published.

Record information about gender as the person self-identifies and explicitly discloses, taking information from readily and publicly available sources
Prefer using terms from a controlled vocabulary, such as LCDGT or MeSH, recording the source in subfield $2.
Take into account the following considerations:
Is there potential for this information to harm the [person] through outing or violating the right to privacy?
Is there an indication that the [person] consents to having this information shared publicly?
Will including this information help a library user in the search process?

If no suitable term is found in the LCDGT or MeSH, use a thesaurus like the Homosaurus. The Homosaurus is the International linked data thesaurus by and for the LGBTQ+ community. It has extensive coverage of gender diverse terminology. The Homosaurus has an approved Subject Source Code for MARC: $2 homoit.

If we continue to record gender in NARs, it is an opportunity to work directly with authors or publishers to get the necessary metadata about the person as they desire to be represented, seen, and recorded. By using the words that the authors’ use to describe themselves, we can more accurately facilitate access to our diverse collections and amplify authors’ identities in their own voices. To do this will require an industry wide workflow change where publishers and author’s work together to create or provide authority data to libraries. If we thought vendor provided bibliographic data was problematic, I can’t imagine what their authority data would look like. So, while this would be the ideal scenario there must be reasons why we can’t utilize author questionnaires from publishers for library authority records. Unfortunately those reasons elude me. Perhaps we’ll find some answers today!

Don’t record gender at all

In some ways this is the “nuclear option.” If we can’t agree on how to record gender and for what purpose we are recording it, then why is it even necessary to record in the first place? I honestly don’t know if I entirely agree with this option, but it is the simplest solution. The risks far outweigh the “benefits” and we haven’t even seen the “benefits” come into fruition. As far as I know nothing is being done with gender metadata in library authority records to accomplish any kind of LRM User Tasks. We didn’t have this feature 10 years ago, and we still haven’t done anything with that data.

Arguably, if we’re going to take a stand for DEI — no longer recording gender and removing the 375 field from existing NARs is a concrete stand. It would dismantle the binary gender in library data completely and allow authors to be represented as individuals and have their works to stand for themselves. The intellectual creation — the work of an author has little if anything to do with their gender. Our conceptions of gender are simply in relation to our cultural constructions of normality. If we remove the notions of gender, the author is a human who created something in this world that deserves description and access. It is our role as catalogers to provide that description in a respectful and neutral way to ensure equitable access.

Conclusion

I admit to having a very traditional view of the library catalog. In 1876, Charles Ammi Cutter outlined his objects of a dictionary catalog, and stated that a catalog is essentially an inventory of what is held in a library, and it should allow a user to find things based on a given title, author, or subject. As librarians we can’t define everything, and besides, defining everything is a constraint in and of itself.

It is a conundrum. One one hand I do see the benefits of being able to search for authors by their gender, race/ethnicity, religion, time active, occupation, etc., but I don’t really think that is the function of the catalog. Leave that to the bibliographies or encyclopedias. Perhaps this is a feature we can have to enhance the catalog through the integration of Wikidata, but I don’t believe that it should also be the function of the bibliographic and authority data as well. Our data doesn’t have to do everything, but if anything it shouldn’t continue to cause harm.

Why do I care about this one little element so much? Because it’s about actually rectifying 150 years of oppression. Doing something that we can actually do to be more inclusive. And it affects my community, myself, my peers and my colleagues. Because I am not a static thing, but an always evolving being. Though I identify one way now, it may not mean that I will always identify that way. If this pandemic has taught us all one thing — that is we cannot predict the future. And I certainly don’t want my past forever etched in my authority record — that is if I ever have one ;-)

Moral of the story: If you kick the hornet’s nest, you’ll be asked to chair the committee.

Follow up

Since this talk was given, the PCC Ad Hoc Task Group on Recording Gender in Name Authority Records issued its revised report that was accepted and adopted by the PCC. The report instructs catalogers to no longer record gender in name authority records — we went for the nuclear option. A new committee has been formed to answer outstanding questions about recording gendered information in bibliographic records.