The State of NLP Literature: Part IIIa
Impact — Most Cited Papers and Aggregate Citation Metrics by Time, Paper Type, and Venue
This series of posts present a diachronic analysis of the ACL Anthology —
Or, as I like to think of it, making sense of NLP Literature through pictures.
Research articles can have impact in a number of ways — pushing the state of the art, answering crucial questions, finding practical solutions that directly help people, making a new generation of potential-scientists excited about a field of study, and more. As scientists, it seems attractive to quantitatively measure scientific impact, and this is particularly appealing to governments and funding agencies; however, it should be noted that individual measures of research impact are limited in scope — they measure only some kinds of contributions.
The most commonly used metrics of research impact are derived from citations. A citation of a scholarly article is the explicit reference to that article. Citations serve many functions. However, a simplifying assumption is that regardless of the reason for citation, every citation counts as credit to the influence or impact of the cited work. Thus several citation-based metrics have emerged over the years including: number of citations, average citations, h-index, relative citation ratio, and impact factor.
It is not always clear why some papers get lots of citations and others do not.
One can argue that highly cited papers have captured the imagination of the field: perhaps because they were particularly creative, opened up a new area of research, pushed the state of the art by a substantial degree, tested compelling hypotheses, or produced useful datasets, among other things.
Note however, that the number of citations is not always a reflection of the quality or importance of a piece of work. Note also that there are systematic biases that prevent certain kinds of papers from accruing citations, especially when the contributions of a piece of work are atypical, not easily quantified, or in an area where the number of scientific publications is low. Further, the citations process can be abused, for example, by egregious self-citations.
Nonetheless, given the immense volume of scientific literature, the relative ease with which one can track citations using services such as Google Scholar and Semantic Scholar, and given the lack of other easily applicable and effective metrics, citation analysis is an imperfect but useful window into research impact.
In this post, we examine citations of NLP papers — specifically papers in the ACL Anthology (AA). We focus on two aspects:
- Most cited papers: We already looked a little at the most cited papers in various areas of research in Part II. Here we begin by looking at the most cited papers overall and in various time spans. We will then look at most cited papers by paper-type (long, short, demo, etc) and venue (ACL, LREC, etc.). Perhaps these make interesting reading lists. Perhaps they also lead to a qualitative understanding of the kinds of AA papers that have received lots of citations. (I must confess though, that I find inspiration from all quarters — whether they are more or less cited papers.)
- Aggregate citation metrics by time span, paper type, and venue: Access to citation information allows us to calculate aggregate citation metrics such as average and median citations of papers published in different time periods, published in different venues, etc. These can help answer questions such as: on average, how well cited are papers published in the 1990s? on average, how many citations does a short paper get? how many citations does a long paper get? how many citations for a workshop paper? etc.
In the next post, Part IIIb, we will examine citations by area of research, gender, and academic age. See list of references at the bottom for discussions on measuring impact beyond citations.
Data: The analyses presented below are based on information about the papers taken directly from AA (as of June 2019) and citation information extracted from Google Scholar (as of June 2019). We extracted citation information from Google Scholar profiles of authors who had a Google Scholar Profile page and had published at least three papers in the ACL Anthology. This yielded citation information for about 75% of the papers (33,051 out of the 44,896 papers). We will refer to this subset of the ACL Anthology papers as AA’. All citation analysis below is on AA’.
Some quick notes before we jump in:
- Paper: The State of NLP Literature: A Diachronic Analysis of the ACL Anthology. 2019.
- Caveats and Ethical Considerations: This is work in progress and is not meant to be a complete or comprehensive view the AA literature.
See: About the NLP Scholar Project: Acknowledgments, caveats, limitations, ethical considerations, and related work.
- Interactive Visualizations, Data, and Anonymity: The visualizations I am developing for this work (using Tableau) are interactive — so one can hover, click to select and filter, move sliders, etc. However, I am not currently able to publish the interactive visualizations in a way that can be anonymized. Since I want to be able to anonymize public posts about this work as per the ACL guidelines, I include here relevant screenshots. The visualizations and data will be available once the work is published in a peer-reviewed conference. During the relevant anonymity period, this post and the associated paper will be anonymized.
#Citations and Most Cited Papers
Q. How many citations have the AA’ papers received? How is that distributed among the papers published in various decades?
A. ~1.2 million citations (as of June 2019)
Below is a timeline where each year has a bar with height corresponding to the number of citations received by papers published in that year. Further, the bar has colored fragments corresponding to each of the papers and the height of a fragment (paper) is proportional to the number of citations it has received. Thus it is easy to spot the papers that received a large number of citations, and the years when the published papers received a large number of citations.
Hovering over individual papers reveals an information box showing the paper title, authors, year of publication, publication venue, and #citations:
Discussion: With time, not only have the number of papers grown, but also the number of high-citation papers. We see a marked jump in the 1990s over the previous decades, but the 2000s are the most notable in terms of the high number of citations. The 2010s papers will likely surpass the 2000s papers in the years to come.
Q. What are the most cited papers in AA’?
A. The most cited papers in the AA’ are shown below.
Discussion: We see that the top-tier conference papers (green) are some of the most cited papers in AA’. There are a notable number of journal papers (dark green) in the most cited list as well, but very few demo (purple) and workshop (orange) papers.
In the interactive visualizations (to be released later), one can click on the url to be to taken directly to the paper’s landing page in the ACL Anthology website. That page includes links to meta information, the pdf, and associated files such as videos and appendices. There will also be functionality to download the lists. Alas, copying the lists from the screenshots shown here is not easy.
Q. What are the most cited AA’ journal papers ? What are the most cited AA’ workshop papers? What are the most cited AA’ shared task papers? What are the most cited AA’ demo papers? What are the most cited tutorials?
A. See the keynote slides embedded below. (Click on the navigation button on the center right of the image to change paper type, or better yet, first click on the icon at the bottom right of the image to go full screen. Use right and left arrow keys to navigate.)
Discussion: Machine translation papers are well-represented in many of these lists, but especially in the system demo papers list. Toolkits such as MT evaluation ones, NLTK, Stanford Core NLP, WordNet Similarity, and OpenNMT have highly cited demo or workshop papers.
The shared task papers list is dominated by task description papers (papers by task organizers describing the data and task), especially for sentiment analysis tasks. However, the list also includes papers by top-performing systems in these shared tasks, such as the NRC-Canada, HidelTime, and UKP papers.
Q. What are the most cited AA’ papers from individual venues such as ACL, CL journal, TACL, EMNLP, LREC, etc.?
A. See below.
Q. What are the most cited papers in the last decade?
A. Below are the most cited AA’ papers in the 2010s:
Q. What are the most cited AA’ papers from the earlier periods?
A. See below:
Discussion: The early period (1965–1989) list includes papers focused on grammar and linguistic structure. The 1990s list has papers addressing many different NLP problems with statistical approaches. Papers on MT and sentiment analysis are frequent in the 2000s list. The 2010s are dominated by papers on word embeddings and neural representations.
Average Citations by Time Span
Q. How many citations did the papers published between 1990 and 1994 receive? What is the average number of citations that a paper published between 1990 and 1994 has received? What are the numbers for other time spans?
A. Total citations for papers published between 1990 and 1994: ~92k
Average citations for papers published between 1990 and 1994: 94.3
Discussion: The early 1990s were an interesting period for NLP with the use of data from the World Wide Web and technologies from speech processing. This was the period with the highest average citations per paper, closely followed by the 1965–1969 and 1995–1999 periods. The 2000–2004 period is notable for:
- a markedly larger number of citations than the previous decades
- third highest average number of citations
The drop off in the average citations for recent 5-year spans is largely because they have not had as much time to collect citations.
Aggregate Citation Statistics, by Paper Type and Venue
Q. What is the average number of citations received by different types of papers: main conference papers, workshop papers, student research papers, shared task papers, and system demonstration papers?
A. In this analysis, we include only those AA’ papers that were published in 2016 or earlier (to allow for at least 2.5 years to collect citations). There are 26,949 such papers.
The graph below shows the average citations by paper type when considering papers published 1965–2016:
Average citations considering papers published: 2010–2016:
Median citations considering papers published 1965–2016:
Median citations considering papers published 2010–2016:
Discussion: Journal papers have much higher average and median citations than other papers, but the gap between them and top-tier conferences is markedly reduced when considering papers published since 2010.
System demo papers have the third highest average citations; however, shared task papers have the third highest median citations. The popularity of shared tasks and the general importance given to beating the state of the art (SOTA) seems to have grown in recent years — something that has come under criticism.
It is interesting to note that in terms of citations, workshop papers are doing somewhat better than the conferences that are not top tier.
Finally, the citation numbers for tutorials show that even though a small number of tutorials are well cited, a majority receive 1 or no citations. This is in contrast to system demo papers that have average and median citations that are higher or comparable to workshop papers.
Throughout the analyses in this article, we see that median citation numbers are markedly lower than average citation numbers. This is particularly telling. It shows that while there are some very highly cited papers, a majority of the papers obtain much lower number of citations — and when considering papers other than journals and top-tier conferences, the number of citations is frequently lower than ten.
Q. What are the average number of citations received by the long and short ACL main conference papers, respectively?
A. Short papers were introduced at ACL in 2003. Since then ACL is by far the venue with the most number of short papers (compared to other venues). So we compare long and short papers published at ACL since 2003 to determine their average citations. Once again, we limit the papers to those published until 2016 to allow for the papers to have time to collect citations.
Discussion: On average, long papers get almost three times as many citations as short papers. However, the median for long papers is two-and-half times that of short papers. This difference might be because some very heavily cited long papers push the average up for long papers.
Q. Which venue has publications with the highest average number of citations? What is the average number of citations for ACL and EMNLP papers? What is this average for other venues? What are the average citations for workshop papers, system demonstration papers, and shared task papers?
A. CL journal has the highest average citations per paper. Below are the average citations for AA’ papers published between 1965 and 2016, grouped by venue and paper type:
Average citations for papers published between 2010 and 2016, grouped by venue and paper type:
Median citations for papers published between 1965 and 2016, grouped by venue and paper type:
Median citations for papers published between 2010 and 2016, grouped by venue and paper type:
Discussion: In terms of citations, TACL papers have not been as successful as EMNLP and ACL; however, CL journal (the more traditional journal paper venue) has the highest average and median paper citations (by a large margin). This gap has reduced in papers published since 2010.
When considering papers published between 2010 and 2016, the system demonstration papers, the SemEval shared task papers, and non-SemEval shared task papers have notably high average (surpassing those of EACL and COLING); however their median citations are lower. This is likely because some heavily cited papers have pushed the average up. Nonetheless, it is interesting to note how, in terms of citations, demo and shared task papers have surpassed many conferences and even become competitive with some top-tier conferences such as EACL and COLING.
Q. What percent of the AA’ papers that were published in 2016 or earlier are cited more than 1000 times? How many more than 10 times? How many papers are cited 0 times?
A. Google Scholar invented the i-10 index as another measure of author research impact. It stands for the number of papers by an author that received ten or more citations. (Ten here is somewhat arbitrary, but reasonable.) Similar to that, one can look at the impact of AA’ as a whole and the impact of various subsets of AA’ through the number of papers in various citation bins.
See graph below for the percentage of AA’ papers in various citation bins:
Discussion: About 56% of the papers are cited ten or more times. 6.4% of the papers are never cited. Note also that some portion of the 1–9 bin likely includes papers that only received self-citations.
Q. What are the percentages of papers in various citation bins, when looking at papers from specific time spans?
A. See the keynote slides embedded below. (Click on the navigation button on the center right of the image, or first click on the icon at the bottom right of the image to go full screen. Then use right and left arrow keys to navigate.)
2016Jan–2016Dec represents all papers published in 2016. We can examine this set of papers to see how well cited papers are after 2.5 years.
Discussion: It is interesting that the percentage of papers with 0 citations is rather steady (between 7.4% and 8.7%) for the 1965–1989, 1990–1999, and 2010–2016 periods. The majority of the papers lie in the 10 to 99 citations bin, for all except the recent periods (2010–2016 and 2016Jan–2016Dec). With time, the recent period should also have the majority of the papers in the 10 to 99 citations bin.
The numbers for the 2016Jan–2016Dec papers show that after 2.5 years, about 89% of the papers have at least one citation and about 33% of the papers have ten or more citations.
Q. What are the citation bin percentages for individual venues and paper types?
A. See graphs below:
Below are the numbers for papers not part of the main conference:
Discussion: Observe that 70 to 80% of the papers in journals and top-tier conferences have ten or more citations. The percentages are markedly lower (between 30 and 70%) for the other conferences shown above, and even lower for some other conferences (not shown above).
CL Journal is particularly notable for the largest percentage of papers with 100 or more citations. The somewhat high percentage of papers that are never cited (4.3%) are likely because some of the book reviews from earlier years are not explicitly marked in CL journal, and thus they were not removed from analysis. Also, letters to editors, which are more common in CL journal, tend to often obtain 0 citations.
CL, EMNLP, and ACL have the best track record for accepting papers that have gone on to receive 1000 or more citations.
*Sem, the semantics conference, seems to have notably lower percentage of high-citation papers, even though it has fairly competitive acceptance rates.
Instead of percentage, if one considers raw numbers of papers that have at least ten citations (i-10 index), then LREC is particularly notable in terms of the large number of papers it accepts that have gone on to obtain ten or more citations (~1600).
Thus, by producing a large number of moderate-to-high citation papers, and introducing many first-time authors, LREC is one of the notable (yet perhaps undervalued) engines of impact on NLP.
About 50% of the SemEval shared task papers received 10 or more citations, and about 46% of the non-SemEval Shared Task Papers received 10 or more citations. About 47% of the workshop papers received ten or more citations. About 43% of the demo papers received 10 or more citations.
- Analyze NLP papers that are published outside of the ACL Anthology.
- Analyze the differences in the number of citations in various areas of research within NLP.
- Analyze the differences in number of citations of papers by men and women (controlling for area of research, academic age, etc.).
- Measure impact in other ways beyond citations.
Measuring Research Impact beyond Citations
- Measuring impact in the humanities: Learning from accountability and economics in a contemporary history of cultural value
- Measuring the societal impact of research
- Measuring scientific impact beyond academia: An assessment of existing impact metrics and proposed improvements
- It’s time to update our understanding of scientific impact
- Measuring Scientific Impact Beyond Citation Counts
- Scientometrics 2.0: New metrics of scholarly impact on the social Web
Other Work on Citations Analysis
Other posts in the series:
- Part I: Size and Demographics
- Part II: Areas of Research (Examining Title Terms)
- About the NLP Scholar Project: Acknowledgments, caveats, limitations, ethical considerations, and related work
- NLP Scholar: An Interactive Visual Explorer for the ACL Anthology (coming soon)
Project Email: email@example.com
Twitter hashtag: #nlpscholar
See the About NLP Scholar page for a list of caveats, ethical considerations, related work, and acknowledgments.