Named Entity Recognition (NER) with spaCy

Named Entity Recognition (NER) is an important facet of Natural Language Processing (NLP). By using NER we can intelligently extract entity information (relevant nouns like places, people, locations etc.) from natural language to help derive more meaning from the text.

NER can be used to build recommendations, quickly extract relevant information from large text, customer support and even cataloging text content.

There are pre-trained models available from NLTK and spaCy for many NLP problems, including Named Entity Recognition. In this article, we will go through a gentle introduction on how to perform NER with spaCy.

Goes without saying that you will need to setup spaCy on your machine first. Please follow the instructions provided here to install spaCy on your machine.

spaCy supports multiple languages to varying levels, for a full list of models visit this page.

Named Entities

spaCy supports the following entity types for models trained on the OntoNotes 5.

Let’s take a look at an example, we are loading the “en_core_web_lg” model for NER. The model is English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. It assigns word vectors, context-specific token vectors, POS tags, dependency parse and named entities.

import spacy
nlp = spacy.load("en_core_web_lg")

passing text to the model,

doc = nlp(“Manchester United Football Club is a professional football club based in Manchester, England established in 1978”)

The model returns a spacy.tokens.doc.Doc object which you can iterate over. Since we did not define a custom pipeline for our model object, it performed all NLP operations supported by the model.

Now, let’s iterate through the named entities returned by the model

for ent in doc.ents:
print(ent.text, ent.label_)

Extracting named entities from a news article

For this example, we will be using an awesome library called newspaper to scrape a news article and perform NER on the content. The newspaper library provides a lot of functionality out of the box like the ability to summarise an article in addition to supporting non-English languages.

from newspaper import Article
import spacy
nlp = spacy.load("en_core_web_lg")url = r"https://techcrunch.com/2020/09/16/ios-14-is-now-available-to-download/"
article = Article(url)
article.download()
article.parse()
doc = nlp(article.text)for ent in doc.ents:
print(ent.text, ent.label_, ent.start_char, ent.end_char)
Named Entities from the article

displacy

spaCy also provides a handy visualisation library called displacy to visualise a named entities in a text. You can use displacy, like so,

doc = nlp("Manchester United was founded in 1878 as Newon Heath in Manchester, England")
spacy.displacy.serve(doc, style='ent')
Named Entities visualised using displacy

We can use it to even visualise dependency parse tree for a text, like so,

doc = nlp(u"Manchester United was founded in 1878 in Manchester, England")
spacy.displacy.serve(doc, style='dep')
Dependency Parse Tree visualised with displacy

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store