Training Custom NER with SpaCy

Akshata G
3 min readNov 17, 2020

--

spaCy is an open-source software library for advanced natural language processing, written in the programming languages Python and Cython.

Named entity recognition (NER) is a sub-task of information extraction (IE) that helps the categories specified entities in a body of texts. NER is also simply known as entity identification, entity chunking and entity extraction. NER is used in many fields in Artificial Intelligence (AI) including Natural Language Processing (NLP) and Machine Learning. Features provided by spaCy are- Tokenization, Parts-of-Speech (PoS) Tagging, Text Classification and Named Entity Recognition.

Spacy provides the fastest and most accurate result as compared to the NLTK . It also offers access to larger word vectors that are easier to customize.

NER Workflow:

Output or Result of NER is shown below.

Implementation Spacy NER using Python

Installation of Spacy using following command.

pip install -U spacy

After installation, you need to download a language model

First we have to import the necessary packages required for the custom creation process.

from __future__ import unicode_literals, print_function
import plac
import random
from pathlib import Path
import spacy
from tqdm import tqdm

We have to create the train dataset for training the NER model.

TRAIN_DATA = [(‘Who is chetan’, {‘entities’: [(7, 12, ‘PERSON’)]}),
(‘Who is Kamal Khumar?’, {‘entities’: [(7, 19, ‘PERSON’)]}),
(‘I like London and USA’, {‘entities’: [(7, 13, ‘LOC’), (18, 20, ‘LOC’)] })]

Define the variables required for the training model to be processed.

model = None
output_dir=Path(“C:\\Users\\Aksha\\Documents\\ner”)
n_iter=100

Next, load a blank model for NER action and set up the pipeline using create_pipe function.

#load the model
if model is not None:
nlp = spacy.load(model)
print(“Loaded model ‘%s’” % model)
else:
nlp = spacy.blank(‘en’)
print(“Created blank ‘en’ model”)

#set up the pipeline

if ‘ner’ not in nlp.pipe_names:
ner = nlp.create_pipe(‘ner’)
nlp.add_pipe(ner, last=True)
else:
ner = nlp.get_pipe(‘ner’)

Train the model using the following code.

other_pipes = [pipe for pipe in nlp.pipe_names if pipe != ‘ner’]
with nlp.disable_pipes(*other_pipes): # only train NER
optimizer = nlp.begin_training()
for itn in range(n_iter):
random.shuffle(TRAIN_DATA)
losses = {}
for text, annotations in tqdm(TRAIN_DATA):
nlp.update(
[text],
[annotations],
drop=0.5,
sgd=optimizer,
losses=losses)
print(losses)

OUTPUT of trained model

100%|██████████| 3/3 [00:00<00:00, 32.00it/s]{'ner': 10.165919601917267}100%|██████████| 3/3 [00:00<00:00, 30.38it/s]{'ner': 8.44960543513298}100%|██████████| 3/3 [00:00<00:00, 28.11it/s]{'ner': 7.798196479678154}100%|██████████| 3/3 [00:00<00:00, 33.42it/s]{'ner': 6.569828731939197}100%|██████████| 3/3 [00:00<00:00, 29.20it/s]{'ner': 6.784278305480257}

Test the trained model

for text, _ in TRAIN_DATA:
doc = nlp(text)
print(‘Entities’, [(ent.text, ent.label_) for ent in doc.ents])

save the model to your path which stored in the output_dir variable.

if output_dir is not None:
output_dir = Path(output_dir)
if not output_dir.exists():
output_dir.mkdir()
nlp.to_disk(output_dir)
print(“Saved model to”, output_dir)

Once you saved the trained model you can load the model using following code.

print(“Loading from”, output_dir)
nlp2 = spacy.load(output_dir)
for text, _ in TRAIN_DATA:
doc = nlp2(text)
print(‘Entities’, [(ent.text, ent.label_) for ent in doc.ents])
print(‘Tokens’, [(t.text, t.ent_type_, t.ent_iob) for t in doc])

Conclusion

I hope you have now understood how to train your own NER model on top of the spaCy NER model. Thanks for reading!

--

--