As we develop new features for our academic search and discovery engine Semantic Scholar we often find ourselves asking “what is the latest applicable research or state of the art?”. This is a critical question that we need to answer early on in a project to assess the novelty of a specific approach and to ensure that our AI research builds on and extends existing work that has been done by the research community.
Last year we started a project to classify citations by their intent in an effort to make it easier for our users to understand why a specific author cited another paper. In addition to brainstorming our own ideas, we also started by assessing the current state of research with a literature review by taking two approaches:
Approach #1 — Start with a Known Paper
We know from experience with developing the concept of highly influential citations for Semantic Scholar that citation classification is a research problem that has received attention in the research community. By starting with the Identifying Meaningful Citations paper (Valenzuela et al. 2015) that describes our concept of highly influential citations we were able to use the paper’s citations to identify multiple promising papers. This included one key paper on citation classification (Jurgens et al. 2016), a more recent published version (Jurgens et al. 2018) of which we ended up referencing and using as a data source during the initial development of our citation intent classification model.
Approach #2 — Start with a Keyword Search
Another approach that we used during our literature review is to search for papers by using keywords. Searching for “citation classification” and focusing on the top NLP venues (e.g. EMNLP) provided us with a number of additional results that we ended up referencing in our research including Teufel et al. (2006).
In addition to reviewing individual search results, it’s always helpful to look at each paper’s citations to get an idea of which new papers have been published that extend highly cited existing research. Interestingly enough when we recently replicated this literature review approach we found at least one new paper (Saaed-Ul et al. 2018) that appears to further extend some of the research that we referenced for our project.
Results of our Work
Our citation intent classification project resulted in a research paper that has been accepted at NAACL-HLT 2019 and a publicly available dataset that is five times larger than existing datasets. We also released a new feature that is now available on Semantic Scholar that surfaces citation classifications as method, background or result extension tags which makes it easier for researchers to navigate and discover research while browsing our citation graph:
We encourage you to try it out and let us know what you think!
Valenzuela, Marco et al. “Identifying Meaningful Citations.” AAAI Workshop: Scholarly Big Data (2015).
Jurgens, David et al. “Citation Classification for Behavioral Analysis of a Scientific Field.” CoRR abs/1609.00435 (2016): n. Pag.
Jurgens, David et al. “Measuring the Evolution of a Scientific Field through Citation Frames.” Transactions of the Association of Computational Linguistics 06 (2018): 391–406.
Teufel, Simone et al. “Automatic classification of citation function.” EMNLP (2006).
Hassan, Saeed-Ul et al. “Deep context of citations using machine-learning models in scholarly full-text articles.” Scientometrics 117 (2018): 1645–1662.
Cohan, Arman et al. “Structural Scaffolds for Citation Intent Classification in Scientific Publications.” NAACL-HLT (2019).