Improving data and policies to support LGBTQ+ people in STEM

Shane Coffield
SciTech Forefront
Published in
16 min readNov 3, 2023

Shane Coffield, Kolin Clark, Anna Dye, Colbie Chinowsky, Briana Niblick, Marco Reggiani, Bryce Hughes, Alfredo Carpineti, Randall Hughes, Lauren Crawford, LeManuel Bitsóí

As growing attention is placed on demographic data collection and diversity, equity, & inclusion (DEI) in STEM, the authors of this report have been working together across the US and UK to build a knowledge base on LGBTQ+ representation in STEM. Through this collaboration, they reveal key needs and recommendations for universities, workplaces, professional societies, funding bodies, and government agencies working to build STEM environments where LGBTQ+ people can thrive.

EXECUTIVE SUMMARY

  • Growing research shows that STEM environments are often hostile to LGBTQ+ people, leading to underrepresentation of these communities.
  • Data on LGBTQ+ representation and experiences is still sparse. Demographic information on sexual orientation and gender identity (SOGI) is not yet widely collected in the way that other categories like race/ethnicity and sex are collected.
  • The lack of data makes it difficult to track LGBTQ+ representation and progress over time — particularly for specific groups within the diverse LGBTQ+ umbrella — let alone to track the effectiveness of policy interventions aimed at improving inclusion in STEM.
  • Increased SOGI data collection (e.g., by universities and science agencies) is urgently needed. However, data collection itself is not a magical cure to the challenges and barriers that LGBTQ+ people face. It should be approached thoughtfully and as a tool to help drive policy change that cultivates equity, cultural transformation, and radical inclusion of LGBTQ+ people across disciplines.

PROPOSED FRAMEWORK

The topic of supporting LGBTQ+ people in STEM demonstrates how science and policy can be interwoven (Figure 1). Development of LGBTQ+ inclusive STEM policies (“policy for science”) should be based on a scientific knowledge base — both quantitative and qualitative — about these demographics in STEM (“science for policy”). Further, that knowledge base should be built on ethical and rigorous research practices, with special attention to trust, privacy, and data security (“policy for science”). Most importantly, this entire process should be iterative and improve over time, through co-production with LGBTQ+ communities in all stages.

Figure 1 is a diagram with 4 boxes arranged vertically, describing the interconnected science and policy components of the article. Starting at the bottom, a box describes research policies and best practices. This leads upward to a box about the scientific knowledge base including SOGI data and LGBTQ+ experiences. The knowledge base leads upward to developing and revising STEM policies, which then leads to the ultimate goal at the top of building and sustaining inclusive STEM environments
Figure 1: Conceptual framework for collecting and leveraging data to build LGBTQ+ inclusive STEM environments (full size here)

BACKGROUND

The current knowledge base

Recent years have seen a growing knowledge base for the challenges that LGBTQ+ people (lesbian, gay, bisexual, transgender, queer/questioning, and other sexual and gender minorities, for example, intersex and asexual people) face in STEM fields. STEM fields in particular can be hostile due to cultural norms of masculinity, and avoiding topics seen as personal or political (e.g., Hughes, 2018).

LGBTQ+ people — particularly trans individuals — often considering leaving their STEM jobs (Institute of Physics, Royal Astronomical Society, and Royal Society of Chemistry, 2019), and are more likely to face career limitations, harassment, and professional devaluation (Cech & Waidzunas, 2021).

Moreover, it is critical to emphasize that LGBTQ+ communities are extremely diverse, with oppression compounding uniquely for LGBTQ+ people of color, people with disabilities, and other historically underrepresented and marginalized groups (e.g. Gonzales & Duran, 2023; Cech 2022; Miller & Downey, 2020).

Gaps in the knowledge base

Although research on LGBTQ+ populations in STEM has been increasing, this research is still relatively new. One of the reasons that LGBTQ+ populations are so understudied has to do with the ways people are included and represented in data collection through demographic categories. When compared, for example, to race/ethnicity or age, sexual orientation and gender identity (SOGI) are not (yet) consistently collected on large-scale quantitative surveys such as the US Census or the National Science Foundation Survey of Earned Doctorates.

Qualitative studies are important for understanding lived experience — i.e., perspectives, personal identities, and histories (see HHS’s explainer) — as they relate to people’s pathways through STEM fields and institutions. Approaches like private structured interviews, focus groups, and open-ended surveys sometimes collect fully intersectional demographic information (e.g.., SOGI, race, gender, etc). Current research, however, does not always report in detail on participants’ overlapping identities due to low numbers of individuals involved, lack of intersectional approaches, or other ethical concerns. Further, qualitative research is often undervalued in policy discussions given the perception of a lack of rigor or broad applicability to populations.

The lack of data leads to significant gaps for understanding when and why people leave STEM, including disparities in different STEM subfields, or the experiences of specific groups within LGBTQ+ communities (e.g., transgender or non-binary individuals, LGBTQ+ people of color, etc). Moreover, the lack of demographic data on LGBTQ+ representation makes it harder to develop data-driven policies, implement retention strategies, and measure progress on diversity, equity, and inclusion in STEM.

Social and political climates

The lack of SOGI data collection and consideration is directly related to the broader political and social contexts in the US, UK, and beyond. In our experience advocating for increased SOGI data collection, for example, we found that institutions and governments have often actively resisted data collection. This can be due to external and internal pressures, such as the politicization of LGBTQ+ identities and efforts to vilify or undermine their validity. This problem is particularly evident through the 500+ U.S. state bills targeting trans youth, trans healthcare, and discussion of LGBTQ+ issues in the education system (see ACLU bill tracker). Similarly hostile political climates continue to develop in other countries such as the UK — see, for example, the rise in hate crimes and anti-trans political rhetoric. This context increases the challenges of SOGI data collection, given LGBTQ+ people’s justified distrust in their institutions and the safety concerns of potential data misuse.

Governmental progress on data collection

On the positive side, there has been mounting pressure and forward momentum developing, with many institutions and agencies finally beginning to recognize the value and utility of collecting SOGI data. Most notably, the US Office of Science and Technology Policy and the National Science and Technology Council have recently mandated the creation of SOGI data action plans across agencies in their Federal Evidence Agenda on LGBTQI+ Equity. Otherwise there have not been wide-scale requirements to include SOGI in governmental data collection in the US or UK.

Education agencies in both the US and UK have been collecting SOGI data on large surveys for the past decade (e.g., US National Center for Education Statistics; UK Higher Education Statistics Agency), but they have not been provided the analytical resources, guidance, or imperative to utilize this data for policy implementation.

Beyond the educational system, census agencies are the key collectors of population-scale demographic data. Such population-level data is a critical benchmark for assessing LGBTQ+ representation in specific organizations or settings. In the UK, the English and Welsh census (ONS sexual orientation and gender identity) and the Scottish Census (NRS sexual orientation and gender identity) have included SOGI items, representing a watershed moment for national measurement of LGBTQ+ populations.

Conversely, the US lags behind; in lieu of mandate or authorization to collect SOGI data, the US Census and American Community Survey — the federal government’s two largest surveys — only measure same-sex partner cohabitation, which is an inadequate measure for sexual orientation. It is also imperative to note that gender identity is not collected. On the bright side, however, the US Census Bureau has added SOGI questions to the Household Pulse Survey and recently requested approval to incorporate SOGI items on the American Community Survey, marking a major step forward for large-scale SOGI data collection in the US.

Finally, in the context of assessing representation in STEM, some major governmental science funders — including the US National Science Foundation (NSF) and UK Research and Innovation (UKRI) — have not yet collected SOGI data on their relevant surveys. The lack of data from these government bodies is a major barrier and creates a circular problem: in the case of NSF, the lack of data prevents LGBTQ+ identities from being classified as underrepresented or marginalized, which in turn leads to a lack of justification for inclusion in demographic surveys and prevents allocation and usage of federal resources to address the issue of underrepresentation or marginalization. However, the NSF has recently committed to including SOGI questions in their surveys by 2026.

Moving forward with intention

While the increasing momentum around SOGI data collection offers promise for the future of LGBTQ+ inclusion in STEM, it raises key questions of who gets counted and which social categories are considered in intersectional analyses. Many surveys, for example, do not have the power or structure to adequately collect information on the most marginalized groups within “LGBTQ+”, especially trans and non-binary individuals.

As more surveys and data collection are designed, it is essential that they be structured to enable analyses that identify disparities for people who experience intersecting systems of oppression, most notably LGBTQ+ people of color. Data collection and analysis which are not intersectional and inclusive risk perpetuating current power structures that continue or reproduce oppression and exclusion in STEM.

In this dual-edged context of both growing momentum around SOGI data collection and increasingly hostile political climates, we provide a set of recommendations for institutions embarking on SOGI data collection — both regarding the data itself and the broader goal of cultivating STEM ecosystems where LGBTQ+ people can thrive.

This report summarizes the findings of ongoing collaborations between US and UK stakeholders, and serves as an expansion of the Wilton Park report titled “Data for Retention: Addressing under-representation of LGBT+ minorities in STEM.”

THE NATURE OF SOGI DATA

SOGI data is demographic information about sexual orientation (e.g., lesbian, gay, bisexual, pansexual, asexual, etc.), gender identity (e.g., man, woman, non-binary, transgender, genderfluid, genderqueer, etc.) and sometimes biological sex (e.g., male, female, intersex). These identities are complex, overlapping, and can evolve over time, requiring thoughtful consideration in their collection and analysis. Universities, professional societies, workplaces, funding agencies, and other government agencies embarking on data collection need to recognize SOGI data as:

Relational. The nature of demographic data can be messy as it relies upon self-reporting of identities in flux. Individuals have a deep and sometimes complicated relationship with their own identities that can change depending upon the context of the survey and their current personal circumstances. Clear understanding by those filling out the surveys of how SOGI data will be used, the purpose of its incorporation, and the safety of this data can help minimize data inaccuracies when filling out SOGI items.

Temporal. As queer and trans identities are rooted in breaking down stagnant fixed binaries of gender and sexual orientation, the labels which individuals use to describe themselves may be fluid and change over time. The dynamic nature of the identities and the language describing them can make longitudinal data collection and analysis more challenging. However, understanding singular surveys as snapshots of the community and analyzing fluxes in populations can provide invaluable information about the dynamic nature of identity, and should be viewed as benefits rather than barriers to collecting the data.

Contextual. Disclosure of SOGI identities will vary upon the context of the survey, the institution administering the survey, and the trust the individual has with these institutions. The importance of communication and trust in the institution should be emphasized to help with accurate data collection. However, results should not be seen as a fixed or complete representation of the entire LGBTQ+ community, as some individuals may not feel safe or able to properly disclose their identities based upon the context.

Political. LGBTQ+ communities continue to experience discrimination, oppression, and marginalization, which includes ideological attacks from society, politicians, media figures, and anti-LGBTQ+ organizations. Therefore, it is important to safeguard SOGI data from potential politicization and misuse. Low numbers of people identifying as LGBTQ+ in surveys (an outcome that could be motivated by temporal and/or contextual reasons) might affect the resources that are allocated to LGBTQ+ people and inaccurately shape the ways LGBTQ+ communities are seen. An in-depth risk assessment should be conducted to identify how SOGI data could be used for policy and advocacy rather than for nefarious purposes to further stigmatize, traumatize, marginalize, oppress, and discriminate against the community. SOGI data should be used only to serve the community and not to further harm. At the same time, the political nature of SOGI data should not be used as justification not to collect it, particularly given the cultural tendency of STEM fields to brand themselves as apolitical.

Sensitive. The safety of LGBTQ+ individuals should be at the forefront of any data collection. The format and wording of SOGI items should encourage participation as opposed to the possibility of inflicting harm on the community. Proper care should be taken, in consultation with the community, to minimize harm in data collection. Storage and preservation of SOGI data are also of utmost importance. Data should be confidential and anonymized by following strict protocols to prevent “outing” of individuals that can put them in danger of direct or indirect harm. This risk should not provide an excuse for institutions not to collect SOGI information. Instead, institutions should be more intentional, respectful and thorough in their approaches to making demographic data collection secure and more valuable.

A conceptual diagram arranged as five circles describing SOGI data. It summarizes the above text paragraphs, including SOGI data as temporal, relational, sensitive, contextual, and political
Figure 2: Five characteristics of SOGI data

What SOGI data collection can do

Meaningful SOGI data collection is urgently needed in STEM fields and institutions to address systemic discrimination and oppression. Collection of demographic data in ways that are fully intersectional and inclusive is an important step toward (1) asserting the legitimacy of LGBTQ+ identities and highlighting the presence of LGBTQ+ people, (2) understanding when and why LGBTQ+ people are entering and leaving STEM fields or changing STEM career paths, (3) effectively allocating resources to address disparities, and (4) developing data-driven policies to improve STEM environments.

What it cannot do

SOGI data collection, like other forms of demographic data collection, is one part of a multi-faceted approach to the challenges facing minoritized and underrepresented communities in STEM. The data itself does not create trust or change policies. On the contrary, improper collection and analysis can cause harm, lead to more groups feeling excluded, or exacerbate inequities. As summarized by Kevin Guyan in Queer Data,

“A society with more data about LGBTQ people is not necessarily a society that is better for LGBTQ people.”

In order to achieve equity and authentic inclusion, data collection must be purposeful, inclusive, longitudinal, secure, and followed by critical analysis and transformative actions to remedy social inequities.

RECOMMENDATIONS

Institutions and agencies should begin collecting SOGI data as one part of a holistic approach to improving diversity, equity, and inclusion (Figure 1). They should first consider the existing knowledge base and their own capacity for change, followed by policy development alongside institutional data collection, with LGBTQ+ communities involved throughout the process.

1. Advance societal education and awareness on the issues. LGBTQ+ oppression should be understood as one facet of how white, straight, cisgender male-dominated Western and colonialist ideologies have historically underpinned modern scientific institutions. Addressing LGBTQ+ underrepresentation and marginalization in STEM must involve understanding intersectionality and the interlocking systems of oppression and privilege.

This work must be rooted in anti-racism and center communities of color to ensure culturally relevant approaches. Institutions should recognize the current societal realities, which include violence toward LGBTQ+ communities of color and attempts to delegitimize or leverage LGBTQ+ identities, particularly trans identities, for political gain.

Data collection can be well intentioned and still be utilized to harm queer and trans people in certain political or regional contexts (e.g., in the case of state audits of gender-affirming care information in Florida). Queer theory can provide a useful lens for understanding the issues and the social/historical context shaping LGBTQ+ participation in STEM. The existing literature and data on LGBTQ+ experiences should also be considered prior to asking more from the communities (additional reading and resources provided in our Resources document).

2. Conduct an internal audit of available resources and capacity for change. Prior to collecting SOGI data, an institution should have the resources to accurately and safely collect sensitive personal information, analyze it, distribute the results, and enact policy interventions. This process requires having the institution fully behind the tasks, including information technology and institutional research staff, as well as compensation for LGBTQ+ community partners (see #5), communications departments, and various levels of leadership engaged. Resources should also include accountability mechanisms to ensure that data collection can protect those at risk and lead to action as necessary.

3. Make changes based on what is already known. Existing research has already identified some critical short-term policy needs for LGBTQ+ inclusion in STEM. Needs include widespread access to single-user or gender inclusive restrooms, respect for pronouns and name changes, anti-harassment policies with consequences, inclusive travel policies that are sensitive to logistical needs and geographic risks, gender-affirming healthcare policies, allyship training, mentoring and networking resources, funding for LGBTQ+ ERGs, resource centers, and other supportive measures. Institutions and agencies should also fund continued research on LGBTQ+ representation and experiences in STEM.

Such actions can benefit all groups, especially where multiple forms of discrimination and marginalization exist. Perhaps most importantly, data collection is not necessary to begin these critical changes that can improve the daily life and success of LGBTQ+ individuals (see Resources document).

4. Design data collection with purpose. Organizations should be honest about their motivations for SOGI data collection — whether that be for advancements, productivity, health outcomes, or moral imperative. Avoid data collection for its own sake, especially considering the risks and costs involved for LGBTQ+ communities. Leadership should approach collection efforts with intention to empower structural change of institutions, to minimize harm, and to leverage data as evidence.

Data collection should be seen as a tool for diagnosing systemic inequities and reshaping institutional cultures, rather than a tool for “fixing” oppressed and marginalized people. For example, policies built around increasing the resilience of LGBTQ+ people in STEM, while well intentioned, place the onus on the individual rather than acknowledging the system that required them to cultivate this resilience in the first place.

Organizations should consider which survey questions and methodologies are needed. For example, the question of “sex assigned at birth” is a topic of disagreement among statisticians and may not be necessary in most circumstances, when gender is the primary demographic of interest. If quantitative data is desired, surveys must have sufficient sample size and statistical power to answer the question being asked. In the case of insufficient sample sizes, disaggregation and other techniques, if done safely, can help identify the needs of smaller subsets of the community that can lead to more participation in data collection.

5. Involve LGBTQ+ communities in the entire process. Thoughtful co-creation and co-production are key to building capacity, developing trust, and building relationships with communities — both those being consulted and those being surveyed. Given the broader historical context of racism and colonization upon which Western institutions and science were built, LGBTQ+ communities of color should be engaged to ensure culturally relevant approaches. Addressing LGBTQ+ retention in STEM should prioritize intersectionality and include voices from various groups such as women, trans, intersex, and asexual individuals, people with disabilities, and a mix of socioeconomic backgrounds.

Co-creation and co-production should consist of co-ownership — i.e., equal partnership, from question design to analysis to policy design, with room to learn from mistakes, correct or recalibrate, and improve over time. Data should be disseminated to, and be usable by, LGBTQ+ and other researchers in ways that are ethical, just, and that safeguard confidentiality and anonymity of participants. Co-creation and co-production also necessitate adequate resources for administrative costs and for compensating LGBTQ+ people for their time and effort. This avoids the burden on marginalized communities to address systemic oppression, commonly known as “invisible labor”.

Effort should also be focused on redistribution of power and resources by critically addressing power imbalances that exist between institutions, researchers, and underrepresented and marginalized communities. Clear accountability mechanisms should be implemented to ensure that promises made to the communities are fulfilled.

6. Employ rigorous methodologies. SOGI data collection and analysis requires balancing scientific and methodological rigor with flexibility and up-to-date approaches to queer identities and the language used to describe them. Rigor includes transparency of data processing, mixed qualitative and quantitative methods, reproducibility, data interoperability, sufficient statistical power to provide insight from the collected data, and incorporation of relevant techniques such as relational data analysis and cluster analysis for multidimensional demographic data. Statisticians and data analysts should have both technical and cultural competence, ideally including staff who identify as LGBTQ+.

7. Ensure privacy and security. Small sample sizes and the sensitive nature of SOGI data require particular attention to individuals’ anonymity and safety. The current social, cultural, and political contexts pose serious risks to LGBTQ+ communities, including violence toward trans people and LGBTQ+ communities of color in particular. As such, collection of SOGI data requires investments in secure data systems, training of information technology and research staff, and institutional commitment to ethical and equitable use of data.

8. Use inclusive and flexible language in survey items. Harmonization should be prioritized over standardization. Harmonization involves a flexible yet consistent structure for designing survey items and analyzing results, whereas standardization often employs fixed survey language and rigid analysis of results. General guidelines for SOGI survey harmonization include organizing response options alphabetically rather than putting socially privileged identities first by default; avoiding the ostracizing option, “other,” and using “another identity not listed” or “write-in.” All questions should be optional, potentially including a “decline to answer” option. Include checkboxes to enable select-all-that-apply functionality for response items. Finally, terminology is constantly evolving so it is paramount to avoid outdated or offensive terms which can retraumatize respondents.

Conclusion: Cultivation of diverse, equitable, and inclusive STEM environments would benefit from improved SOGI data collection which is built on trust, privacy, and community involvement. However, data itself will not change the STEM ecosystem; data should not be collected for its own sake, nor should lack of data be used as an excuse for inaction. Rather, data should be used to inform transformative policies and create STEM environments where LGBTQ+ people can thrive.

ACKNOWLEDGMENTS

The collaborations and ongoing work underlying these recommendations has been made possible by the National Science Policy Network, UK Science and Innovation Network, the Wilton Park event on Data for LGBTQ+ Retention in STEM, and the Royal Society of Chemistry. Kolin Clark and Shane Coffield contributed equally as lead co-authors of this analysis.

Author affiliations

Shane Coffield 1,2,3; Kolin Clark 3,4; Anna Dye 3,5; Colbie Chinowsky 3,6; Briana Niblick 7, Marco Reggiani 8, Bryce Hughes 9, Alfredo Carpineti 10, Randall Hughes 11, Lauren Crawford 12, LeManuel Bitsóí 13

1 Earth System Science Interdisciplinary Center, University of Maryland, College Park; 2 NASA Goddard Space Flight Center; 3 National Science Policy Network; 4 Washington University in St. Louis School of Medicine; 5 North Carolina State University, Department of Plant and Microbial Biology; 6 Vanderbilt University, Department of Cell and Developmental Biology; 7 U.S. Environmental Protection Agency, Office of Research and Development, Washington, DC; 8 University of Strathclyde, Department of Civil and Environmental Engineering; 9 Montana State University, Department of Education; 10 Pride in STEM; 11 Coastal Sustainability Institute, Northeastern University; 12 Royal Society of Chemistry; 13 Brandeis University, Office of Diversity, Equity, and Inclusion

Progress Pride Flag, credit Shutterstock

--

--

Shane Coffield
SciTech Forefront

Postdoc researcher at the University of Maryland / NASA Goddard Space Flight Center. Interests include Earth, environment, science policy, DEI in STEM