http://www.lovejustine.com/journal/whats-in-a-name

Introduction to Named Entity Recognition

A tool which invariably comes handy when we do Natural Language Processing tasks

Suvro Banerjee
Nov 14, 2018 · 8 min read

Introduction

In this article we will learn what is Named Entity Recognition also known as NER. We will discuss some of its use-cases and then evaluate few standard Python libraries using which we can quickly get started and solve problems at hand.

What is Named Entity Recognition ?

Named Entity Recognition, also known as entity extraction classifies named entities that are present in a text into pre-defined categories like “individuals”, “companies”, “places”, “organization”, “cities”, “dates”, “product terminologies” etc. It adds a wealth of semantic knowledge to your content and helps you to promptly understand the subject of any given text.

Few Use-Cases of Named Entity Recognition

  • Classifying content for news providers —
Classifying content for news providers
https://www.paralleldots.com/named-entity-recognition
Efficient search across the brands
Customer support on Twitter
https://www.paralleldots.com/named-entity-recognition

Standard Libraries to use Named Entity Recognition

I will discuss three standard libraries which are used a lot in Python to perform NER. I am sure there are many more and would encourage readers to add them in the comment section.

  1. spaCy
  2. NLTK

Standford NER

Standford NER
pip install nltk
  • english.conll.4class.distsim.crf.ser.gz: Location, Person, Organization and Misc
Stanford Named Entity Recognition
Output of the Stanford NER tagger

spaCy

spaCy NER
pip install spacy
python -m spacy download en
spaCy NER
Output from spaCy NER
https://spacy.io/api/annotation#pos-tagging

NLTK

NLTK NER
  1. Parts of Speech (POS) tagging
  2. Named Entity Recognition
pip install nltk
import nltk
print('NTLK version: %s' % (nltk.__version__))
from nltk import word_tokenize, pos_tag, ne_chunknltk.download('words')
nltk.download('averaged_perceptron_tagger')
nltk.download('punkt')
nltk.download('maxent_ne_chunker')
article = '''
Asian shares skidded on Tuesday after a rout in tech stocks put Wall Street to the sword, while a
sharp drop in oil prices and political risks in Europe pushed the dollar to 16-month highs as investors dumped
riskier assets. MSCI’s broadest index of Asia-Pacific shares outside Japan dropped 1.7 percent to a 1-1/2
week trough, with Australian shares sinking 1.6 percent. Japan’s Nikkei dived 3.1 percent led by losses in
electric machinery makers and suppliers of Apple’s iphone parts. Sterling fell to $1.286 after three straight
sessions of losses took it to the lowest since Nov.1 as there were still considerable unresolved issues with the
European Union over Brexit, British Prime Minister Theresa May said on Monday.'''
def fn_preprocess(art):
art = nltk.word_tokenize(art)
art = nltk.pos_tag(art)
return art
art_processed = fn_preprocess(article)
art_processed
Snapshot of Output (POS tagging) from the above code
results = ne_chunk(art_processed)for x in str(results).split('\n'):
if '/NN' in x:
print(x)
Snapshot of the output from the above code
pattern = 'NP: {<DT>?<JJ>*<NN>}'
cp = nltk.RegexpParser(pattern)
cs = cp.parse(art_processed)
print(cs)
Snapshot from the output from above
from nltk.chunk import conlltags2tree, tree2conlltags
from pprint import pprint
iob_tagged = tree2conlltags(cs)pprint(iob_tagged)
The snapshot of the output from the above code
for word, pos, ner in iob_tagged:
print(word, pos, ner)
NER using NLTK

What’s next ?

So, we have just learnt what is Named Entity Recognition tagging and how to use them to solve generic problems using API’s.

  1. Build more sophisticated NER models (let’s say using Deep Learning) and also evaluate how better they perform.
  2. Take a task which you encounter daily which deals with Natural Language, figure out a problem which you want to solve and then use all what you have learnt in NER to solve it.
https://www.askideas.com/the-pursuit-of-knowledge-is-more-valuable-than-its-possession/

Sources

Explore Science & Artificial Intelligence

Share interest in Science and explore AI through the principles of Machine/Statistical Learning, Mathematics and Computer Science.

Suvro Banerjee

Written by

“All that is not given is lost” — Tagore | Founder of Explore Science & Artificial Intelligence | MSc. Econometrics from UB | Machine Learning Engineer

Explore Science & Artificial Intelligence

Share interest in Science and explore AI through the principles of Machine/Statistical Learning, Mathematics and Computer Science.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade