Published in



Top 3 Packages for Named Entity Recognition

Comparing SpaCy, NLTK and Flair — the top 3 NER models

By the time you finish reading this article, the amount of text on the internet will grow by over 2 million tweets, 15 papers in scientific journals, 4 Wikipedia articles, and countless news stories and blog posts.

To argue for the utility of such a seemingly infinite repository of text, the proponents of big-data will be quick to quip that “data is the new oil”, but even the richest of oil reserves require an array of tools that enable the extraction and purification of the oil to be of any use.

Named entity recognition is one such tool in the arsenal of natural language processing that allows us to tame large and unstructured text corpora. In this article, we elaborate on the inner workings of NER and discuss three widely used packages — NLTK, SpaCy, and Flair.

Named Entity Recognition

Named Entity Recognition is a two-step process that helps in extracting useful insights from data. It involves identifying keywords, i.e. the named entities, and then categorising them into broader classes.

For instance, in the sentence “The world’s biggest chocolate cake will be made for this Christmas by the Royal cake house.”, Chocolate cake, Christmas, and Royal cake house are the named entities in the sentence. Classifying them as say, “Food”, “Occasion”, “Organisation” is the next step in NER.

The Process

The process involves two stages.

  1. Extraction of entities —In this step, the NER model scans the data and finds the words that can be treated as an entity. IOB tagging, Inside Outside Beginning tagging, can be used to mark the entities in the data. An NER model will be able to find the named entity in the model based on the named entities known to it.
  2. Classification of entities—This is the stage when the entities are categorised into predefined classes, the named entity “chocolate cake” would be categorised as food for instance. The effectiveness of the classification of the entities will depend upon the relevance of the training data. The more similar the training data with the type of the testing data, the more effective the NER model will be.

Packages for NER

  1. SpaCy

Spacy is a powerful Natural Language processing tool used to process large amounts of data. With support for over 64 languages and 63 trained pipelines for 19 languages, it is a handy tool in NLP.

It uses Bloom embedding and residual CNN’s to identify the named entities. Here is an example of NER performed using SpaCy.



Natural language Toolkit is a set of libraries used for NLP. It is widely used in research and for educational purposes. Written in Python, it has access to more than 50 text corpora across 7 languages.

One of the primary principles of NLTK, besides simplicity, consistency, and extensibility, is modularity. Modularity provides components that can be used independently. This might be especially useful for tuning only specific parts of the pipeline or even using third parties in conjunction with this toolkit.

NLTK does NER in two steps. The first step is POS (parts-of-speech) tagging or grammatical tagging, which is followed by chunking to extract the named entities. An example of NER performed in NLTK is given below.


3. Flair

Flair is a simple framework developed for NLP. Flair is built on top of PyTorch which is a powerful deep learning framework. Claimed to support over 250 languages, it is very useful in training small models.

It is pre-trained on an extremely large unlabelled text corpora. An example of NER performed in Flair is as given below.


Factors of Consideration

The best package for an application can be selected based on several important factors — the availability of pre-trained model tags, the format of training data, ease of use, speed of execution, evaluation metrics, and visualisation capabilities of the package.

Pre-trained model tags

Pre-trained model tags indicate the capability and depth of a package. The more the pre-trained model tags, the more insightful the results will be. The available pre-trained NLTK model is only limited to 3 main tags which are PERSON, ORGANISATION, and GPE, while the available SpaCy and Flair models can identify up to 18 tags using the pre-trained model.

Format of training data

The three packages have different formats in which they accept the training data.


The NLTK model for custom-named entity recognition can be developed with the help of the Stanford NER tagger, written in Java. The training data for NLTK looks like this:

Each document in the training data should be separated by a newline.


The training data for SpaCy version 2 is a list of tuples. Each tuple contains a text, start index, end index, and a label. For version 3, refer here.

For example


To train an NER model using Flair, the training data should be of the following format. Each sentence in the training data should be separated using newlines. Each row has two columns. The first column is the word and the second column is the BIO-annotated NER tag.

For example

The performance of trained models is better for SpaCy and Flair when compared to that of NLTK.

Ease of use

If one package allows you to do NER in one step, you may need to execute multiple steps to perform NER in other packages. In NLTK, POS tagging needs to be done as a first step. It is then followed by chunking. This will identify the entities in the document. In SpaCy the “ents” property of the doc object identifies the named entities whereas, in Flair, the predict function is used for the same.


Speed is very important when you’re dealing with large datasets but there is a trade-off between speed and accuracy. The speed of a package is measured using WPS, words per second. Of the three packages, NLTK is found to be the slowest while Spacy beats Flair. So if the processing involves a very large data set, then Spacy is the best option.


Visualising the entities is an effective way of understanding the data for which the three packages have their own methods. NER entities can be visualised in NLTK using the Draw function in the Matplotlib package. SpaCy uses an inbuilt dependency visualiser called Displacy. It can be used in Jupyter Notebook and can even be used in a browser. Flair makes use of the package Visner for performing visualisation. Displacy is the best option out of the three in terms of visualisation.

Evaluation metric

A model can be evaluated using precision, recall and F1 scores. Both NLTK and Flair have built-in functions to evaluate the model whereas SpaCy uses a scorer package for evaluation.

The table below shows the precision, recall, F1 score, and accuracy score using the three packages for a CoNLL 2003 dataset:

Selection of package

By comparing the performance matrix it is observed that Flair tops the list compared to the other two. In terms of speed, SpaCy beats the other two.

Each package offers unique advantages. While it is clear that Flair and SpaCy perform better than NLTK, a choice between SpaCy and Flair can be taken considering the nature of the dataset. For smaller datasets, Flair is a good option considering the accuracy. The time taken will be almost the same as that of SpaCy. SpaCy compromises its precision for its speed.


There are limitations for all three packages. It is important to know these limitations for selecting the ideal package for the application. NLTK is slower when compared to the other two packages. It doesn’t employ neural networks and splits the sentences without considering the semantics of the sentences.

SpaCy is huge, however, the package is not customisable and the internals are not opaque.

Flair is more ideal for smaller applications but its lower speed makes SpaCy a better candidate for large datasets.


  1. Named-entity-recognition
  2. SpaCy
  3. NLTK
  4. Evaluating an NER model
  5. Training custom NER model using NLTK and SpaCy
  6. Training custom NER model using Flair
  7. NLTK for NLP




This is Quantrium’s official tech blog. A blog on how technology enables us to develop great software applications for our clients.

Recommended from Medium

Deep Learning on Graphs with Graph Neural Network


Semantic Segmentation — U-Net

ROC or Precision-Recall 🧐?

Automating Machine Lusing the concepts of MLOPs

Diverse Mini-Batch Active Learning: A Reproduction Exercise

Use Deep Learning to Write Like Shakespeare

Using Deep Learning to write Shakespeare

Stock market prediction using GAN

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Maria Philna Aruja

Maria Philna Aruja

More from Medium

Text classification on Long document with Preprocessing

Smart way to do text classification without labeled data — Zero-shot Classification

Catch Me If You Can — Getting the True Meaning out of Big Cloud of Texts

Train A XLM Roberta model for Text Classification on Pytorch