The State of NLP Literature: Part IIIb

Impact — Examining Citations by Area of Research, Academic Age, & Gender

This series of posts present a diachronic analysis of the ACL Anthology —
Or, as I like to think of it, making sense of NLP Literature through pictures.

Thanks for your interest. Here are links to Part I (Size and Demographics), Part II (Areas of Research), and Part IIIa: Impact (Most Cited Papers and Aggregate Citation Metrics by Time, Paper Type, and Venue) in case you missed them. Each post is presented as a series of questions and answers. Feel free to read any of the posts and jump to questions you might be interested in. (Photo credit: Rhinda Larsen)

In Part IIIa, we examined citations across time spans, paper types and venues. Here, we continue the examination of citations; this time across three other dimensions:

  • Areas of research;
  • Academic age (number of years one has been publishing); and
  • Gender

Motivation: There are good reasons to study citations across each of these dimensions including, but not limited to, the following:

  • Areas of research: To better understand research contributions in the context of the area where the contribution is made.
  • Academic age: To better understand how the challenges faced by researchers at various stages of their career may impact the citations of their papers. For example, how well-cited are first-time NLP authors? On average, at what academic age do citations peak? etc.
  • Gender: To better understand the extent to which systematic biases (explicit and implicit) pervasive in society and scientific publishing impact author citations.

Some of these aspects of study may seem controversial. So it is worth addressing that first. The goal here is not to perpetuate stereotypes about age, gender, or even areas of research. The history of scientific discovery is awash with plenty of examples of bad science that has tried to erroneously show that one group of people is “better” than another, with devastating consequences.

This work is in support of those studies. Unless we measure differences in outcomes such as scientific productivity and impact across demographic groups, we will not fully know the extent to which these inequities and biases impact our scientific community; and we cannot track the effectiveness of measures to make our universities, research labs, and conferences more inclusive, equitable, and fair.

Some quick notes before we jump in:

  • Data: The analyses presented below are based on information of the papers taken directly from the ACL Anthology (AA) (as of June 2019) and citation information extracted from Google Scholar (as of June 2019). We extracted citation information from Google Scholar profiles of authors who had a Google Scholar Profile page and had published at least three papers in the ACL Anthology. This yielded citation information for about 75% of the papers (33,051 out of the 44,896 papers). We will refer to this subset of the ACL Anthology papers as AA’. All citation analysis below is on AA’.
  • Interactive Visualizations and Anonymity: The visualizations I am developing for this work (using Tableau) are interactive — so one can hover, click to select and filter, move sliders, etc. However, I am not currently able to publish the interactive visualizations in a way that can be anonymized. Since I want to be able to anonymize public posts about this work as per the ACL guidelines, I include here relevant screenshots. The visualizations and data will be available once the work is published in a peer-reviewed conference. During the relevant anonymity period, this post and the associated paper will be anonymized.
  • See the About the NLP Scholar Project page for Acknowledgments, caveats, limitations, ethical considerations, and related work.

Papers (most pertinent to this post):

  • Gender Gap in Natural Language Processing Research: Disparities in Authorship and Citations. Saif M. Mohammad. In Proceedings of the 58th Annual Meeting of the Association of Computational Linguistics (ACL-2020). July 2020. Seattle, USA.
  • Examining Citations of Natural Language Processing Literature. Saif M. Mohammad. In Proceedings of the 58th Annual Meeting of the Association of Computational Linguistics (ACL-2020). July 2020. Seattle, USA.
  • The State of NLP Literature: A Diachronic Analysis of the ACL Anthology. Saif M. Mohammad. arXiv preprint arXiv:1911.03562. November 2019.

See full list of associated papers in the About Page.

Citations to Papers by Areas of Research

Q1. What is the average number of citations of AA’ papers that have machine translation in the title? What about papers that have the term sentiment analysis or word representations?

A. Different areas of research within NLP enjoy varying amounts of attention. In Part II, we looked at the relative popularity of various areas over time — estimated through the number of paper titles that had corresponding terms. (You may also want to see the discussion on the use of paper title terms to sample papers from various, possibly overlapping, areas.) The figure below shows the top 50 title bigrams ordered by decreasing number of total citations.

Only those bigrams that occur in at least 30 AA’ papers (published between 1965 and 2016) are considered. (The papers from 2017 and later are not included, to allow for at least 2.5 years for the papers to accumulate citations.)

Discussion: The graph shows that the bigram machine translation occurred in 1,659 papers that together accrued more than 93k citations. These papers have on average 68.8 citations and the median citations is 14. Not all machine translation (MT) papers have machine translation in the title. However, arguably, this set of 1,659 papers is a representative enough sample of machine translation papers; and thus, the average and median are estimates of MT in general. Second in the list are papers with statistical machine in the title — most commonly from the phrase statistical machine translation. One expects considerable overlap in the papers across the sets of papers with machine translation and statistical machine, but machine translation likely covers a broader range of research including work before statistical MT was introduced, neural MT, and MT evaluation.

There are fewer papers with sentiment analysis in the title (356), but these have acquired citations at a higher average (104) than both machine translation and statistical machine. The bigram automatic evaluation jumps out because of its high average citations (337). Some of the neural-related bigrams have high median citations, for example, neural machine (49) and convolutional neural (40.5).

Below are the lists of top 50 bigrams ordered by average citations:

Discussion: Observe the wide variety of topics covered by this list. In some ways that is reassuring for the health of the field as a whole; however, this list does not show which areas are not receiving sufficient attention. It is less clear to me how to highlight those, as simply showing the bottom 50 bigrams by average citations is not meaningful.

Also note that this is not in any way an endorsement to write papers with these high-citation bigrams in the title. Doing so is of course no guarantee of receiving a large number of citations :)

Correlation of Academic Age with Citations

We introduced NLP academic age in Part I. (We examined whether, in terms of its authors, NLP is growing older or younger.) We defined NLP academic age as the number of years one has been publishing in AA. So if this is the first year one has published in AA, then their NLP academic age is 1. If one published their first AA paper in 2001 and their latest AA paper in 2018, then their academic age when publishing that paper (in 2018) is taken to be 18. Here we examine whether NLP academic age impacts citations.

The analyses are done in terms of the academic age of the first author; however, similar analyses can be done for the last author and all authors. (There are limitations to each of these analyses though as discussed further below.)

First author is a privileged position in the author list as it is usually reserved for the researcher that has done the most work and writing. The first author is also usually the main driver of the project; although, their mentor or advisor may also be a significant driver of the project. Sometimes multiple authors may be marked as first authors in the paper, but the current analysis simply takes the first author from the author list. In many academic communities, the last author position is reserved for the most senior or mentoring researcher. However, in non-university research labs and in large collaboration projects, the meaning of the last author position is less clear. (Personally, I prefer author names ordered by the amount of work done.)

Examining all authors is slightly more tricky as one has to decide how to credit the citations to the possibly multiple authors. It might also not be a clear indicator of differences across gender as a large number of the papers in AA have both male and female authors.

Q2. How does the NLP academic age of the first author correlate with the amount of citations? Are first-year authors less cited than those with more experience?

A. The figure below shows various aggregate citation statistics corresponding to various academic ages. To produce the graph I put each paper in a bin corresponding to the academic age of the first author when the paper was published. For example, if the first author of a paper had an academic age of 3 when that paper was published, then the paper goes in bin 3. I then calculate #papers, #citations, median citations, and average citations for each bin. The full table is available in the appendix at the bottom. For the figure below, I further group the bins 10 to 14, 15 to 19, 20 to 34, and 35 to 50. These groupings are done to avoid clutter, and also because many of the higher age bins have a low number of papers.

Discussion: Observe that the number of papers where the first author has academic age 1 is much larger than the number of papers in any other bin. This is largely because a large number of authors in AA have written exactly one paper as first author. Also, about 60% of the authors in AA (17,874 out of the 29,941 authors) have written exactly one paper (regardless of author position).

The curves for the average and median citations have a slight upside down U shape. The relatively lower average and median citations in year 1 (37.26 and 10, respectively) indicate that being new to the field has some negative impact on citations. The average increases steadily from year 1 to year 4, but the median is already at the highest point by year 2. One might say, that year 2 to year 14 are the period of steady and high citations. Year 15 onwards, there is a steady decline in the citations. I would not draw too many conclusions from the averages of the 35 to 50 bin, because of the small number of papers. There seems to be a peak in average citations at age 7. However, there is not a corresponding peak in the median. Thus the peak in average might be due to an increase in the number of very highly cited papers.

Citations to Papers by First Author Gender

As noted in Part I, neither ACL nor the ACL Anthology have recorded demographic information for the vast majority of the authors. Thus we use the same setup discussed in Part I to determine gender: using the United States Social Security Administration database of names and genders of newborns to identify 55,133 first names that are strongly associated with females (probability ≥99%) and 29,873 first names that are strongly associated with males (probability ≥99%). The approach used is not meant to be perfect, but a useful approximation in the absence of true gender information. Unfortunately, we do not have information for other gender identities. See the About the NLP Scholar Project page for a list of caveats and limitations.

Q3. On average, are women cited less than men?

A. Yes, on average, female first author papers have received markedly fewer citations than male first author papers (36.4 compared to 52.4). The difference in median is smaller (11 compared to 13). See figure below:

Discussion: The large difference in averages and smaller difference in medians suggests that there are markedly more very heavily cited male first-author papers than female first-author papers.

The gender-unknown category, which here largely consist of authors with Chinese origin names and names that are less strongly associated with one gender have a slightly higher average, but the same median citations, as authors with female-associated first names.

The differences in citations, or citation gap, across genders may:

  • vary by period of time
  • vary due to confounding factors such as academic age and areas of research

We explore these next.

Q4. How has the citation gap across genders changed over the years?

A. The graph below shows the citation statistics across four time periods:

Discussion: Observe that female-first authors have always been a minority in the history of ACL; however, on average, their papers from the early years (1965 to 1989) received a markedly higher number of citations than those of male first authors from the same period. We can see from the graph that this changed in the 1990s where male first-author papers obtained markedly more citations on average. The citation gap reduced considerably in the 2000s, and the 2010–2016 period saw a further slight reduction in the citation gap.

It is also interesting to note that the gender-unknown category has almost bridged the gap with the males in this most recent time period. Further, the proportion of the gender-unknown authors has increased over the years — arguably, an indication of better representations of authors from around the world in recent years. (Nonetheless, as indicated in Part I, there is still plenty to be done to promote greater inclusion of authors from Africa and South America.)

Q5. How have citations varied by gender and academic age? Are women less cited simply because there is a greater proportion of new-to-NLP female first authors than new-to-NLP male first authors?

A. See figure below with citation statistics broken down by gender and academic age. (This figure is similar to the academic age graph seen earlier, except that it shows separate average and median lines for female, male, and unknown gender first authors.)

Discussion: The graphs show that female first authors consistently receive fewer citations than male authors for the first fifteen years. The trend is inverted with a small citation gap in the 15th to 34th years period.

Q6. Is the citation gap common across the vast majority of areas of research within NLP? Is the gap simply because more women work in areas that receive low numbers of citations (regardless of gender)?

A. The figure below shows the most cited areas of research along with citation statistics split by gender of the first authors of corresponding papers. (This figure is similar to the areas of research graph seen earlier in the post, except that it shows separate citation statistics for the genders.) Note that the figure includes rows for only those bigram and gender pairs with at least 30 AA’ papers (published between 1965 and 2016). Thus for some of the bigrams certain gender entries are not shown.

Discussion: Numbers for an additional 32 areas are shown in the Appendix. Observe that in only about 12% (7 of the top 59) of the most cited areas of research, women received higher average citations than men. These include: sentiment analysis, information extraction, document summarization, spoken dialogue, cross lingual (research), dialogue, systems, language generation. (Of course, note that some of the 59 areas, as estimated using title term bigrams, are overlapping. Also, I did not include large scale in the list above because the difference in averages is very small and it is not really an area of research.)

Thus, the citation gap is common across a majority of the high-citations areas within NLP.

Future Work

  • Identify areas of research using word embeddings and topic modeling.
  • Identify low-citation areas of research.
  • Repeat the analyses above for last authors and also for all authors.
  • Identify new and emerging areas of research. Identify demographic representation in these areas.
  • Determine citation gaps across other dimensions such as race, region, and income.
  • Determine forces that impact (increase or decrease) citation gaps.


The figure below shows additional entries that follow those shown above for the last question. Specifically, 32 areas along with citation statistics split by gender of the first authors of corresponding papers.

The table below shows the full set of NLP Academic Age Bins and their Citation Statistics:

The next post:
- NLP Scholar: An Interactive Visual Explorer for the ACL Anthology

Other posts in the series:
Part I: Size and Demographics
- Part II: Areas of Research (Examining Title Terms)
- Part IIIa: Impact (Most Cited Papers and Aggregate Citation Metrics by Time, Paper Type, and Venue)
- About the NLP Scholar Project: Acknowledgments, caveats, limitations, ethical considerations, and related work

Paper: The State of NLP Literature: A Diachronic Analysis of the ACL Anthology. 2019.

Saif M. Mohammad
Twitter: @saifmmohammad

Project Homepage:

See the About NLP Scholar page for a list of caveats, ethical considerations, related work, and acknowledgments.

Saif is Senior Research Scientist at the National Research Council Canada. His interests are in NLP, especially emotions, creativity, and fairness in language.