Training Custom NER Model Using Flair

Published in

TheCyPhy

9 min readMay 3, 2020

If you are here, it is fair to assume that you have heard about Flair already. Flair is a PyTorch based NLP library that lets you perform a plethora of NLP tasks like POS tagging, Named Entity recognition, text classification, etc. It achieves state-of-the-art performance, is super simple and it includes more powerful embeddings like BERT and ELMO.

To start working flair, it is important to have PyTorch and Flair installed in your environment. Discussing the other characteristics and of all the ways flair is amazing is beyond the scope of this blog so let’s get to the point.

Before starting this blog I will assume that the readers are familiar with what Named Entity Recognition is. There are a lot of definitions out there stating it to be a part of information extraction and whatnot. Still, to set the context, let’s look through one of the very common examples. For a sentence, or technically speaking, a sequence of text,

"George Washington went to Washington"

we might be interested in some important information that the text contains. That could be “George Washington”, which is the name of a person and “Washington”, that is a place or a geographical location, we call them entities. Now to predict such entities there are a lot of pre-trained models made available by the said NLP libraries. For the sake of this blog, let’s look at what flair has to offer. Flair has made available a good number of models across languages which could be found here. For the sequence we have above, let’s see what the model trained on Conll-03(4-class NER), gives as output.

Additionally, we can get a confidence score of each of the predicted entities:

All other functionality provided by flair can be checked out from it’s wonderfully written documentation.

Coming to the main MO of this blog, training your own model using flair to predict custom entities from the text. The flair’s documentation covers that in a brilliant way, but unlike spaCy, the data preparation for the data that is to be fed into training the model is not made pretty intuitive. Though you get access to a lot of datasets by creating a corpus object for datasets available in flair.datasets, you can still load other sequence labeling datasets as mentioned here.

But why train own sequence taggers? In my fairly young professional career, I came across problems where there were some sentences/textual sequence, to be extracted from an entire document. For instance, consider a contract, not just any kind of contract rather a lease contract. In such contracts, the “entities” like the name of the parties involved, the date of termination, so on and so forth make the information of interest. Now, imagine the scenario where these contracts are unstructured. If they were structured, we could just extract the said entities by writing some basic rules.

Now to create a model that could find these entities from the given piece of text, we first need to create a training corpus. Flair has a particular structure in which it expects the corpus to be. As per its documentation, it looks like:

Explanation of the above image:

The sentences in the corpus are separated by an empty line.
Each row(line) has three columns. The first column is the word, the second column is the corresponding POS tag and the final column denotes the BIO-annotated NER tag.
We need not have all these three columns, say we need to train the model to just predict the NER tags, we can omit the second column.

The important thing to note here is, that the datasets that are available within flair.datasets and the other sequence labeling datasets should be in the given format. Hence for our case, where we are training for custom entities, we need to prepare that dataset on our own. We will come back to it in a while but first, let’s just have a quick look at how this corpus is read.

Data Creation :

flair.datasets has this class called ColumnCorpus using which we can create our corpus object. As you see the arguments for it’s __init__ method, the three .txt files which correspond to train, test and validation corpora, have the data in the format we discussed above. columns is a dictionary where we define the columns in the text files.

Putting our foot down on the fact that we now know how to read the dataset, let’s go back to creating it. Given that annotations are available in a certain format, the following piece of code could be modified as per the requirement. The following code is written assuming the data is in form of a pandas dataframe with two major columns, first being the actual text, the second is the annotation, that is a list of tuples, where the tuple has two elements, the first is the annotated text and the second is the corresponding label.

Ideally, the text should just be a sentence, if not(is a paragraph), any sentence tokenizer could be used, for example, spaCy’s sentence tokenizer to get the text to sentence level. Anyway, coming back to the code, the following code will create a .txt file for the data given, hence could be called thrice to create the train, test and validation data in the required input format.

create_data.py

This will save the .txt at the given path and would look like the following.

Reading the Corpus:

Finally, we are ready to load the corpus we have created and begin with the training. Let’s start with loading our corpus.

# define columns
columns = {0 : 'text', 1 : 'ner'}# directory where the data resides
data_folder = 'path/to/data/'# initializing the corpus
corpus: Corpus = ColumnCorpus(data_folder, columns,
                              train_file = 'train.txt',
                              test_file = 'test.txt',
                              dev_file = 'dev.txt')

Now that we have loaded our corpus, we can use this corpus object to get its information like:

Training :

Continuing further, the next thing is to define the tag we want our model to be able to predict and create the tag dictionary, which is just all the available labels in the corpus.

# tag to predict
tag_type = 'ner'# make tag dictionary from the corpus
tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)

Next thing is to take care of the embeddings. The beauty of flair is in what all it lets you do with the embeddings. You can choose from the bunch of pre-trained models to create embeddings, even stack the said flair embeddings with powerful BERT, ELMO, and whatnot using the StackedEmbedding class. And obviously, train your own embeddings. Staying within the scope of this blog, let’s move forward with the training. The details of embeddings are wonderfully documented here.

from flair.embeddings import WordEmbeddings, StackedEmbeddings
from typing import Listembedding_types : List[TokenEmbeddings] = [
        WordEmbeddings('glove'),
        ## other embeddings
        ]embeddings : StackedEmbeddings = StackedEmbeddings(
                                 embeddings=embedding_types)

The next step is to initialize the Sequence Tagger. Conceptually speaking, what is trained in the backend is a bi-directional LSTM. Flair lets you pass a flag to use the conditional random fields as well. Let’s define the said tagger and see the architecture too.

from flair.models import SequenceTaggertagger : SequenceTagger = SequenceTagger(hidden_size=256,
                                       embeddings=embeddings,
                                       tag_dictionary=tag_ditionary,
                                       tag_type=tag_type,
                                       use_crf=True)
print(tagger)

The output of the print statement is:

Flair hasn’t stopped being amazing just yet, all that’s left to do now is to write exactly three more lines of code to create magic.

from flair.trainers import ModelTrainertrainer : ModelTrainer = ModelTrainer(tagger, corpus)trainer.train('resources/taggers/example-ner',
              learning_rate=0.1,
              mini_batch_size=32,
              max_epochs=150)

This sets our model for training. Now there are a couple of things to take note of. Remember while creating the corpus object, we passed the validation as well as the test data then and there? That’s because flair internally does a lot of things for you, while training and even post-training. It creates a new directory called resources in your current working directory where you will find everything from the logs of training, loss information to the predictions on the test set with a confidence score. Under the same directory at ‘resources/taggers/example-ner’ our model will be saved.

These 7 things will be in the said directory after the completion of the training. If one intends to, they can use these weights to visualize as per the documentation. The results(predictions) on the test sets are available in a tab-separated format which can be used to evaluate the model. Anyway, the performance metrics are present there at the end of training.log for each entity label.

Lastly, we have a trained model and we can now use it to predict the tags for a new sequence of text. That again can be done in a couple of lines shown in the following snippet.

from flair.data import Sentence
from flair.models import SequenceTagger# load the trained model
model = SequenceTagger.load('resources/tagger/example-ner/final-model.pt')# create example sentence
sentence = Sentence('I love Berlin')# predict the tags
model.predict(sentence)print(sentence.to_tagged_string())

Where we will get the output as:

I love Berlin <B-LOCATION>

Conclusion :

And that’s how you train the NER model for custom entities using Flair. It is easy to use, just the data preparation was a bit tedious unlike spaCy, where we have tools for that also(PhraseMatcher, etc). Flair is said to achieve better performance than spaCy for sequence tagging. It will be a good practice to first learn to train NER for custom entities with spaCy and then moving to Flair. There are a lot of blogs and out there for spaCy even the documentation is pretty rich. This blog captures the difference between spaCy and Flair NER very nicely.

While going through all this, I was in doubt at one place where we pass the columns to the ColumnsCorpus. Since in the datasets that are available already in the format, we had POS tags and I was not sure if the POS tag information plays any role in training. Because if it did then the data preparation for our custom cases will now become more complicated. But while looking this up, I go to this GitHub issue that cleared that no other column other than the text is involved in the modeling processes. Hence, we are good to go.

Endnote :

The motivation to write this blog came from the fact that I did not find help with Flair and had to figure out everything from the documentation hence took more time to get a grip of than it took for spaCy.

Thank you for taking the time and reading this. If we haven’t met yet, find me at chauhanakash23.github.io

References :

Is Flair a suitable alternative to SpaCy?

Flair is a powerful NLP (Natural Language Processing) library which is open-sourced and developed by Zalando Research…

medium.com

flairNLP/flair

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. Flair is: A…

github.com

Custom Named Entity Recognition Using spaCy

What is Named Entity Recognition (NER)?

towardsdatascience.com