How to Use AllenNLP’s Pretrained NER Model in 2021

Wishing for better and complete documentations!

Abhishek Verma
Apr 1 · 6 min read

One would hope that at least NLP libraries will have instructions up-to-date to use them. While work is regularly going on AllenNLP (given their progenitor i.e. Allen Institute’s Longformer fame), it will do good to update their documentation also.

Source: https://github.com/allenai/allennlp

Recently, I wanted to try out AllenNLP’s Pretrained NER model. But, lo and behold, it didn’t work that easily and after 4 LONG hours of running helter-skelter between Github and StackOverflow, I somehow go it to work.

This is the link to documentation with code that won’t work: https://docs.allennlp.org/v1.0.0rc3/tutorials/getting_started/using_pretrained_models/

Here is the code from the documentation you would want to run against your data:

predictor = 
Predictor.from_path(“https://allennlp.s3.amazonaws.com/models/ner-model-2018.04.26.tar.gz")
results = predictor.predict(sentence=”Did Uriah honestly think he could beat The Legend of Zelda in under three hours?”) for word, tag in zip(results[“words”], results[“tags”]):
print(f”{word}\t{tag}”)

One would assume this:

Would get the code running (using common sense) since that’s not stated in the documentation.

But, one is greeted by this cryptic error.

The solution for this I found was:

Apparently, they have separated the code from the models. A good application of separation of concerns but this wasted a lot of time. Since, in other libraries, the models would load as and when needed and you don’t have to install their repositories separately. Or it would be stated in the documentation but NO.

And I wish the pain ended here but no, you are greeted by another error on this journey after just installing the above library.

Someone really loves their Configuration Errors but after some time (a long time), one would realize, this is not the root cause. Your attention should have been somewhere else, but, hey, one should always look at the error, RIGHT? But, not here, sorry.

/usr/local/lib/python3.7/dist-packages/allennlp/data/token_indexers/token_characters_indexer.py:60: UserWarning: You are using the default value (0) of `min_padding_length`, which can cause some subtle bugs (more info see https://github.com/allenai/allennlp/issues/1954). Strongly recommend to set a value, usually the maximum size of the convolutional layer size when using CnnEncoder.
UserWarning,
---------------------------------------------------------------------------ConfigurationError Traceback (most recent call last)<ipython-input-6-0cf51f5821d7> in <module>()
1 from allennlp.predictors import Predictor
----> 2 predictor = Predictor.from_path("https://allennlp.s3.amazonaws.com/models/ner-model-2018.04.26.tar.gz")
3 results = predictor.predict(sentence="Did Uriah honestly think he could beat The Legend of Zelda in under three hours?")
4 for word, tag in zip(results["words"], results["tags"]):
5 print(f"{word}\t{tag}")
--10 frames--/usr/local/lib/python3.7/dist-packages/allennlp/common/from_params.py in from_params(cls, params, constructor_to_call, constructor_to_inspect, **extras)
555 if not isinstance(params, Params):
556 raise ConfigurationError(
--> 557 "from_params was passed a `params` object that was not a `Params`. This probably "
558 "indicates malformed parameters in a configuration file, where something that "
559 "should have been a dictionary was actually a list, or something else. "
ConfigurationError: from_params was passed a `params` object that was not a `Params`. This probably indicates malformed parameters in a configuration file, where something that should have been a dictionary was actually a list, or something else. This happened when constructing an object of type <class 'allennlp.nn.regularizers.regularizer_applicator.RegularizerApplicator'>.

This time you had to take a look above before all the traceback and error.

You would find this innocent (and not by any standard way, this would be the cause) piece of information which is the main culprit. And, to this, I thought a simple pip install would solve it.

But, no, boy, was I wrong? Yes, terribly, I was terribly wrong because I got the same error above again. Why I asked myself dejectedly?

I read the error again (takes a long breath) and found that meteor_score is not available in the version of nltkthat got installed.

So, I finally did this.

This upgraded nltkto version 3.5 in which meteor_score is there.

I ran the code again but to my dismay, yet AGAIN. This happened.

One may think, everything should be resolved by now. But, NO. Why should things be so simple? After another 30 mins, I found out the documentation was updated 2 years ago and I need to replace the old model path with the latest model path.

From this:

https://allennlp.s3.amazonaws.com/models/ner-model-2018.04.26.tar.gz

To this:

https://storage.googleapis.com/allennlp-public-models/ner-model-2020.02.10.tar.gz

The code finally ran completely. Phew!!!

So, this is the complete guide to running the AllenNLP’s NER model for any other lamb like me going to the slaughter. This is in context to Google Colab since it is the best place to try these things out. You may have to solve some more dependencies (if you are running this locally on your system) but hopefully, a lot has been solved by me.

The Documentation Which Should Have Been

First, install allennlp , allennlp-models and the new version of nltk since simply trying to install nltk will still give an error (it installs version 3.2.5 which doesn’t have meteor_score.)

! pip install allennlp-models! pip install -U nltk

or

! pip install allennlp-models! pip install nltk==3.5

After this, you need to replace the old model path with the new one. Here’s the full example with the path changed for you to directly copy.

predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/ner-model-2020.02.10.tar.gz")results = predictor.predict(sentence="Did Uriah honestly think he could beat The Legend of Zelda in under three hours?")for word, tag in zip(results["words"], results["tags"]):
print(f"{word}\t{tag}")

This will give you this SWEET output:

I could not interpret this. Silly me, haha. But, there was nothing about what this output means in the documentation too. Plus, I was a novice in NER outputs. Some minutes later. I found this.

For your ease, here’s what you need to know:

The human annotators have used 9 classes (B-PER, I-PER, BLOC, I-LOC, B-ORG, I-ORG, B-MISC, I-MISC and 0) to indicate the Beginning of a named entity, the Inside of a named entity and the Outside of a named entity. You will see that each major BIO tag is followed by the corresponding named entity category. For instance, the tag B-PER indicates the beginning of a person name, I-PER indicates inside a person name, and so forth. LOC stands for Location, ORG stands for organization, MISC stands for miscellaneous.

Conclusion

I had once had a similar tryst with SpaCy too but that was not this arduous. But, this one really was worth enough to save the time of my fellow data science practitioners.

I hope AllenNLP updates their documentation since it would be a waste not to use such a great NLP library. Or Google stops showing this link but even if you search something like ‘Allen NLP v1.5 NER Model’, you won’t find a good link to follow. So, here’s to hoping nobody’s time gets wasted like me.

Geek Culture

Proud to geek out. Follow to join our +500K monthly readers.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store