Words as Gatekeepers: Measuring Discipline-specific Terms and Meanings in Scholarly Publications
Scholarly text is often laden with jargon, or specialized language that can facilitate efficient communication within fields but hinder understanding for outsiders. Jargon naturally evolves so that researchers and scholars can convey meaning succinctly, but it can be a barrier between fields, and between scientists and the general public.
For example, words such as junction, diode, and bias are specific to the field of optoelectronics, as shown in the figure above. In particular, bias is overloaded with different meanings, or senses, across fields, as it can refer to social discrimination, statistical misestimation, or electric currents. In our paper, we use a natural language processing (NLP) approach called word sense induction to disentangle words’ senses, and show that they can be as specialized as field-specific word types. We define jargon as both discipline-specific words and discipline-specific meanings. See our Findings of ACL 2023 paper for a detailed description of how we operationalize and validate our measure of jargon.
We measure jargon in English abstracts across three hundred fields of study from the Semantic Scholar Open Research Corpus (S2ORC). We find that while the the biological sciences use very distinctive word types, such as names of molecules and chemicals, subfields in math, technology, physics, and economics tend to reuse existing words with specialized meanings. For example, mathematicians repurpose common words such as power, pole, union, surface, and origin.
We connect these measurements of scholarly jargon to two key social implications, to showcase the utility of our metrics for “science of science” research and computational sociolinguistics, which is the study of how social factors relate to language.
First, we measure audience design, or whether scholars decrease their use of jargon depending on who they write for. We find that most fields reduce jargon when publishing in general-purpose, multidisciplinary journals such as Nature, but some fields do so more than others. For example, in the above figure, computer science adjusts its published content based on venue more so than medicine and biology do. A possible explanation for this behavior is that general-purpose venues have a history of being led and dominated by biological and physical sciences.¹ So, though “general-purpose” venues may intend to be for all of science,² some fields are expected to adapt their language more so than others.
Second, we examine how discipline-specific language is associated with two distinct measures of scientific success: citation counts and interdisciplinary impact. Interdisciplinary impact measures the diversity of fields that cite a paper. We ran separate regression models for each field, to see how the relationship between jargon and success may differ across them. Although the direction of correlation between jargon and citation rates varies, jargon is nearly always negatively correlated with interdisciplinary impact.³
Combined, our findings suggest that though some fields do not reduce their use of jargon as much as others in general-purpose venues, this practice may impede interdisciplinary communication. This opens a potential opportunity for the reconsideration of abstract writing norms, especially for venues that intend to bridge disciplines.
[1] PLOS One’s founding letter and Nature’s initial launch of Scientific Reports are two examples of general-purpose venues’ origins.
[2] For example, see Nature’s “Aim and Scope”.
[3] Our study is not causal, but provides a path forward for future studies around the effects of jargon on interdisciplinary connections.
Follow @allen_ai and @semanticscholar on Twitter, and subscribe to the AI2 Newsletter to stay current on news and research coming out of AI2.