Two minutes NLP — Quick Intro to Coreference Resolution with NeuralCoref

Mentions, Word Embeddings, NeuralCoref, and SpaCy

Published in

NLPlanet

2 min readJan 6, 2022

Coreference Resolution has the goal of finding, grouping, and then substituting any ambiguous expressions with the real-world entities they are referring to. It is an important step for a lot of higher-level NLP tasks such as document summarization, question answering, and information extraction.

Here is an example of what Coreference Resolution does.

Example of Coreference Resolution. Image by the author.

A typical coreference resolution algorithm goes like this:

A series of words that are potentially referring to real-world entities are extracted. We call these words mentions.
For each mention and each pair of mentions, we compute a set of features. This is commonly done by averaging the word embeddings of the mention and its adjacent words to consider context information.
Then, we input these features into machine learning models to find the most likely antecedent for each mention (if there is one).

One of the most popular libraries to do Coreference Resolution in python is NeuralCoref.

NeuralCoref

NeuralCoref is a pipeline extension for spaCy 2.1+ which annotates and resolves coreference clusters using neural networks.

Install the library using pip and make sure to have the correct version of spaCy. Remember to download the spacy models for the English language.

Installation of spaCy and NeuralCoref.

Then, import both spaCy and NeuralCoref in your code and add the latter to the spaCy parsing pipeline.

Adding NeuralCoref to spaCy parsing pipeline.

Last, parse a sentence with spaCy. NeuralCoref will automatically resolve the coreferences and annotate them as extension attributes in the spaCy Doc, Span and Token objects under the ._. dictionary.

Testing NeuralCoref.

Thank you for reading! If you are interested in learning more about NLP, remember to follow NLPlanet on Medium, LinkedIn, and Twitter!

Two minutes NLP related posts

Two minutes NLP — Relation Extraction with OpenNRE

Relation Extraction, Knowledge Graphs, Entities, and OpenNRE

medium.com

Two minutes NLP — Easy document annotation with Wikipedia concepts

Semantic annotations, Wikification, Ontologies, and PageRank

medium.com

Two minutes NLP — 11 word embeddings models you should know

TF-IDF, Word2Vec, GloVe, FastText, ELMO, CoVe, BERT, RoBERTa, etc.

medium.com

Two minutes NLP — Quick Intro to Coreference Resolution with NeuralCoref

Mentions, Word Embeddings, NeuralCoref, and SpaCy

NeuralCoref

Two minutes NLP — Relation Extraction with OpenNRE

Relation Extraction, Knowledge Graphs, Entities, and OpenNRE

Two minutes NLP — Easy document annotation with Wikipedia concepts

Semantic annotations, Wikification, Ontologies, and PageRank

Two minutes NLP — 11 word embeddings models you should know

TF-IDF, Word2Vec, GloVe, FastText, ELMO, CoVe, BERT, RoBERTa, etc.

Written by Fabio Chiusano