Practical approach of State-of-the-Art Flair in Named Entity Recognition

punna111
Analytics Vidhya
Published in
5 min readDec 12, 2019
NER in NLP

Today Let’s discuss about most popular use case in NLP, i.e. NER — Named Entity Recognition. In this post we will go through practical usage of one of the state of the art algorithm, Flair.

What is NER?

NER can be used to Identify Entities like Organizations, Locations, Persons and Other Entities in a given text.

What are the use cases of NER?

Many business or real world problems can be solved with NER:

  1. Classification will be very easy, and we can easily identify that document/ text is related to that company, Location or person etc.
  2. Quick identification of hidden insights based on the entities present in the large amount of textual data.
  3. For Information retrieval from text, entity identification very important. With NER, information can be extracted to the correct search query.

Our Algorithm for today:

Today I am proposing Flair Framework for our NER task. It is one of the state of the art Library in NLP tasks. It uses Pytorch. The good thing about Flair NER is it works based on context. It is developed by Zalando Research Team. Let’s get started to see the results.

Installation of Flair

First we need to install pytorch framework. Based on your system compatibility select the option in pytorch.org website and run the command as shown below.

I am running the Anaconda command prompt in Administrator mode as below:

Once it is done with installation, next Install Flair with below command:

pip install flair

Practical Usage of Flair NER

Let’s write the Basic script to see the Flair NER output. First two lines of code is importing the flair models as shown below. For the first time it will download the NER models.

#import commands for flair NER
from flair.data import Sentence
from flair.models import SequenceTagger

Next, loading the downloaded NER model as follows:

#Load NER Model
tagger = SequenceTagger.load('ner')

After that, lets give some sample text. Next we are passing the text to sentence syntax. This will convert the text into tokens.

#Sample text to run NER
text = 'Jackson is placed in Microsoft located in Redmond'
#passing text to sentence
sentence = Sentence(text)

Next the very important line to identify the entities from sentence. To see the entities we are running final 2 lines of code.

# Run NER on sentence to identify Entities
tagger.predict(sentence)
# print the entities with below command
for entity in sentence.get_spans('ner'):
print(entity)

Above Print command will give below output. As we can see below it tagged Jackson as Person (PER), Microsoft tagged as Organization(ORG) and Redmond is tagged as Location(LOC).

PER-span [1]: “Jackson”
ORG-span [5]: “Microsoft”
LOC-span [8]: “Redmond”

With Just 8 lines of simple code, we are able to get the Entities. Let’s see the tagged sentence with tags identified for the same sentence.

print(sentence.to_tagged_string())

Below is the output from above statement.

Jackson <S-PER> is placed in Microsoft <S-ORG> located in Redmond <S-LOC>

Now Let’s write one more sentence and see the output.

#Sample text
text1 = ‘Redmond is coming to New York city’
#passing text to sentence
sentence = Sentence(text1)
# Run NER on sentence to identify Entities
tagger.predict(sentence)
# print the entities with below command
for entity in sentence.get_spans(‘ner’):
print(entity)

and the output is as below:

PER-span [1]: “Redmond”
LOC-span [5,6]: “New York”

Bit tricky right! Now Flair says Redmond is Person, in the first example Redmond came as Location. So in the both places Flair NER is correct. Context is playing key role here. Based on context the Flair NER giving the output, which is very important in the data driven projects. This also solves word disambiguation to some extent. Ex: Whether one word is a person or location? Company or person?

Flair Models

In Flair NER various pretrained models are available. In this post we will discuss about only NER models. Model ner required GPU, but we also have CPU version which is ner-fast model. I am listing few of them below. Flair also have Multilingual Models. You can also see them here

Using Flair on Article Text

Till Now we have seen passing one sentence and predicting Entities. Now lets pass one paragraph and see the results.

Below paragraph I took from Bloomberg here:

text2 = “During a heated deposition this past June, Elon Musk finally seemed to admit that his harshest critics were right. Since forcing through the controversial 2016 purchase of SolarCity Corp., the struggling solar sales-and-installation business he co-founded with his cousins, Tesla Inc.’s chief executive officer has faced almost-constant criticism: The move was called a catastrophe for Tesla, a $2 billion-plus bailout of a debt-saddled company of which Musk himself was chairman and the largest shareholder. Despite plummeting sales and substantial layoffs in the solar division under Tesla after the merger, Musk has fervently defended the SolarCity acquisition, once calling it “blindingly obvious” and a “no-brainer.””

One additional step we have to do before passing the above text to Flair NER, i.e. splitting the paragraph to sentences. To split sentences Flair NER has inbuilt library called segtok. It will be installed along with Flair NER. Now lets write the code to split the paragraph.

#Import segtok library to split the paragraph into sentences
from segtok.segmenter import split_single
sentences = [Sentence(sent, use_tokenizer=True) for sent in split_single(text2)]

from the above line of code, we are simply passing the paragraph and we will be getting the sentences to predict entities. Let’s predict and print the output with below code.

#predicting entities
tagger.predict(sentences)
# print the entities with below command
for sent in sentences:
for entity in sent.get_spans('ner'):
print(entity)

below is the output we got for the above paragraph from Flair NER. And it looks pretty correct.

PER-span [9,10]: “Elon Musk”
ORG-span [9,10]: “SolarCity Corp.”
ORG-span [23,24]: “Tesla Inc.”
ORG-span [42]: “Tesla”
PER-span [55]: “Musk”
ORG-span [12]: “Tesla”
PER-span [17]: “Musk”
ORG-span [22]: “SolarCity”

The Sample code file I have placed on github here.

This is my first post on this blog. Please do let me know if you like the post and your thoughts on this post. You can share your comments / feedback to me.

References:

Pytorch — https://pytorch.org/

Github link for Flair — https://github.com/zalandoresearch/flair

Flair NER Models — https://github.com/zalandoresearch/flair/blob/master/resources/docs/TUTORIAL_2_TAGGING.md

--

--