NLP: Pretrained Named Entity Recognition (NER)
There are a good range of pre-trained Named Entity Recognition (NER) models provided by popular open-source NLP libraries (e.g. NLTK, Spacy, Stanford Core NLP) and some less well known ones (e.g. Allen NLP, Flair, Polyglot, Deep Pavlov) as well as the odd (free) API (e.g. GATE). This tutorial tests out a few and discusses their differing, underlying methods (from Rules-based and CRFs to Deep Neural Networks).
Pretrained NER models purposely kept entity types generic (e.g. Person, Date, Location, Geopolitical Entity, Organisation, etc), however the labelling notation / label names as well as the number of labels an NER model can classify (some classify 3 types of entities, others 4, others 12, etc) depends upon the public dataset used to train the model
GATE
This first one is an API, but I have included it here as it is free to use. It is called ANNIE and is rules-based (although it has rules that work on different layers of abstraction along the NLP pipeline ).
import requests
url = "https://cloud-api.gate.ac.uk/process-document/annie-named-entity-recognizer"
headers = {'Content-Type': 'text/plain'}
response = requests.post(url, data=example_document, headers=headers).json()import json
print(json.dumps(response, indent=2))
NLTK
NLTK provides all the traditional NLP components to construct an NER pipeline to the one shown for GATE. The text is tokenised > the tokens are passed through a Part Of Speech (POS) tagger > a parser chunks the tokens based on their POS tags to find named entities.
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')
{(' '.join(c[0] for c in chunk), chunk.label() ) for chunk in nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize(document))) if hasattr(chunk, 'label') }
Stanford Core NLP
Unlike the previous two NER models, Stanford Core NLP uses a probabilistic model called a Conditional Random Field (CRF). This was the state of the art approach for a while (prior to more modern, deep learning NER models)
An older version of NLTK had an inbuilt wrapper which could access Stanford Core NLP and its pretrained models. Stanford Core NLP offers three pretrained NER models, containing 3, 4 and 7 entity types respectively.
!pip3 install nltk==3.2.4
!wget http://nlp.stanford.edu/software/stanford-ner-2015-04-20.zip
!unzip stanford-ner-2015-04-20.zip
from nltk.tag.stanford import StanfordNERTagger
jar = "stanford-ner-2015-04-20/stanford-ner-3.5.2.jar"
model = "stanford-ner-2015-04-20/classifiers/" st_3class = StanfordNERTagger(model + "english.all.3class.distsim.crf.ser.gz", jar, encoding='utf8')
st_4class = StanfordNERTagger(model + "english.conll.4class.distsim.crf.ser.gz", jar, encoding='utf8')
st_7class = StanfordNERTagger(model + "english.muc.7class.distsim.crf.ser.gz", jar, encoding='utf8')st_3class.tag(document.split())
st_4class.tag(document.split())
st_7class.tag(document.split())
Spacy
Spacy’s NER model is a simple classifier (e.g. a shallow feedforward neural network with a single hidden layer) that is made powerful using some clever feature engineering. Before the input features are fed into the classifier, a stack of weighted bloom embedding layers merge neighbouring features together. This gives each word a unique representation for each distinct context it is in.
!python3 -m spacy download en_core_web_lg
import spacy
sp_lg = spacy.load('en_core_web_lg')
{(ent.text.strip(), ent.label_) for ent in sp_lg(document).ents}
Polyglot
Polyglot’s NER doesn’t use human-annotated training datasets like other NER models.
“Successful approaches to address NER rely on supervised learning …they require human annotated datasets which are scarce.”
Rather, it uses huge unlabelled datasets (like Wikipedia) with automatically inferred entity labels (via features such as hyperlinks).
“We use the internal links embedded in Wikipedia articles to detect named entity mentions. When a link points to an article identified by Freebase as an entity article, we include the anchor text as a positive training example.”
By cleverly addressing the supervised learning labelling limitation, Polyglot has been able to leverage a massive multilingual corpus to train even a simple classifier (e.g. a feedforward neural network) to become a very robust, competitive NER model.
!pip3 install -U git+https://github.com/aboSamoor/polyglot.git@master
!polyglot download embeddings2.en ner2.en
from polyglot.text import Text
Text(document).entities
Flair
Flair provides two pre-trained NER models (the model used is identical — a bi-LSTM on top of a word embedding layer— but the NER dataset used to train each classifier was different ). Whereas Spacy adds a feature engineering step to encode the context of neighbouring words into its NER model at the word embedding layer, Flair uses a deep learning model to do such feature engineering implicitly (an LSTM remembers previous words which have appeared in the sentence).
!pip3 install flair
from flair.models import SequenceTagger
model = SequenceTagger.load('ner-ontonotes-fast') #.load('ner')
from flair.data import Sentence
s = Sentence(document)
model.predict(s)
s.to_dict(tag_type='ner')
Deep Pavlov
DeepPavlov uses a slightly newer variant of Flair’s deep neural architecture known as a Hybrid Bi-LSTM-CRF model.
It is a bi-directional LSTM on top of word and character-level embedding layers (as before). However, it combines an additional Conditional Random Fields (CRF) layer to the output of the model
!pip3 install deeppavlov
!python3 -m deeppavlov install ner_ontonotes
from deeppavlov import configs, build_model
deeppavlov_ner = build_model(configs.ner.ner_ontonotes, download=True)
deeppavlov_ner([sentence])
Allen NLP
Allen NLP offers two NER models with differt architectures. The smaller model uses a Gated Recurrent Unit (GRU) Network to embed words at the character level and another GRU to encode phrases from words embedding using Glove (like LSTMs, GRUs are able to remember sequences of words to add context to the representation).
The second, “fine-grained” NER model uses a bi-LSTM-CRF model (like DeepPavlov) but also includes a pre-trained bi-lstm network known as “Elmo” (the State-Of-The-Art language model prior to BERT and GPT2, etc) as a better word embedding layer.
!pip3 install allennlp
from allennlp.predictors import Predictor
al = Predictor.from_path("https://s3-us-west-2.amazonaws.com/allennlp/models/fine-grained-ner-model-elmo-2018.12.21.tar.gz")
al.predict(sentence=document)
Don’t forget to check out the related NLP tutorial on Entity Grounding