NLP-Entity Coreference Resolution

Sarang Mete
2 min readNov 1, 2022

--

Explore multiple libraries

Photo by Tim Mossholder on Unsplash

Definition:

Coreference Resolution resolves situation where two or more words within a text refer to the same entity. The entities resolved may be a person, place, organization, or event. Read more..

Applications:

Machine Translation, information extraction,
question answering, summarization

Training:

Coreference is pairwise classification task. Coreference resolution typically requires a pre-processing pipeline comprising a variety of NLP tasks (e.g., tokenization, lemmatization, named entity recognition, part-of-speech tagging). We will not discuss how to do training. You can read about how hugging face has developed neuralcoref here and here.

Libraries for application:

  1. huggingface/neuralcoref
  2. allennlp :
    It has some drawbacks and solution to these is here
  3. Stanford Core NLP
  4. coreferee: mixture of neural networks and programmed rules

I’ve used coreferee.

Usage:

text = "Tom was in Texas with his wife and he said that he liked it"
doc = nlp(text)
doc._.coref_chains.print()0: Tom(0), his(5), he(8), he(11)
1: Texas(3), it(13)

In above example ‘it’ resolved to ‘Texas’ but ‘Texas’ resolved to None because Texas is original entity.

# Basic logic to get resolved text:

1. Get all token words

2. Check resolution of each token

3. If it is None, it means it’s original entity or there is nothing to resolve, so ignore None

4. If it is not None, then replace token with resolved/original entity

5. Combine all tokens back to get resolved text

doc = nlp(text)
# Get token list
tok_list = list(token.text_with_ws for token in doc)
print(tok_list)
for index, _ in enumerate(tok_list):
#Check resolution of each token
if None != doc._.coref_chains.resolve(doc[index]):
new_token = ""
#If it is not None,then replcae token with resolved/original entity
for resolved_token in doc._.coref_chains.resolve(doc[index]):
new_token = new_token + resolved_token.text + " "
tok_list[index] = new_token
resolved_text = "".join(tok_list)
Tom was in Texas with Tom wife and Tom said that wife liked Texas

I’ve created a complete end to end project . You can refer it here.

Image by Author

The main challenges I’ve solved in this project:

  1. Create processing logic for coreferee output
  2. Create production ready code.

If you liked the article or have any suggestions/comments, please share them below!

Let’s connect and discuss on LinkedIn

--

--