Named Entity Recognition at your fingertips: A comparative study of various AutoNLP engines (Part 1)

Felix Laumann
NeuralSpace
Published in
10 min readMar 30, 2022

Named Entity Recognition (NER) is a core component in many NLP and Information Retrieval (IR) applications including but not limited to question answering, summarization and machine translation. Overall, it plays an essential role in language understanding. To perform an action on a certain user query you not only need to understand the intent behind it but also need to extract and classify certain occurrences in a piece of text into pre-defined categories.

What are these categories? Happy you asked.

The categories can be thought of as the type of entities a NER model can extract. For example, it can be a name (of an organization, a person, a place, etc.), an address, account numbers, measurement parameters, percentages, and even domain-specific terms like names of chemicals, medicines, etc. Through this method, essentially any valuable information can be extracted from text.

Let us take an example:

If someone says “flights from Berlin to London”, the intent here is flight-search and entities are Berlin and London, which are of type city.

These entities can also be looked at at a more granular level. Berlin can be from-city and London can be to-city.

A domain-specific example could be,

“I need 8 paracetamol tablets”, where 8 is a number, paracetamol is medicine-component, and tablets is medicine-form.

Why should I use a multilingual NER for my application? What are a few common use-cases?

Multilingual NER APIs can be industry-agnostic with a wide range of applications. Let us have a look at some of them.

#1 Powering Content Recommendations & Efficient Search Algorithms

Recommendation systems dominate how we discover new content and ideas in today’s world. News publishers, for example, use NER by extracting entities from a particular article and recommending the other articles which have the most similar entities mentioned in them. Overall, this is approach is effectively used to develop content recommendations for various media industry clients.

#2 Customer Support

There are a number of ways to make the process of customer feedback handling smooth and NER is one of them. One use case is using the extracted entities to categorize the enquiry and assign it to the relevant department within the organization handling this.

#3 Machine Translation Systems

When it comes to machine translation (especially for lower resource languages) named entities prove especially tricky because their translation is based on language-specific rules. If the named entities are extracted before the actual translation the entire process becomes much more accurate.

#4 Efficient Semantic Annotation

Semantic annotation is the process of adding information to a document. This added information is generally named entities that can help machines understand the nuances of a textual document. NER systems can extract such annotations and relations increasing the efficiency of machine-powered analysis. Such automation carried out using NER systems, are employed to identify concepts and relations that are worth annotating.

How can AutoNLP Entity Extraction Engines help your application?

It has been seen that the need for time-consuming and expensive expert annotation hinders the creation of high-performance systems for most languages and domains. Also, with the lack of publicly available multilingual datasets for most low-resource languages, there is an emergent focus for other tools to develop language-agnostic systems for NER.

Are you a developer? Are you a researcher? Are you a project manager? Are you a student? Also, fret not if you don’t have any previous Machine Learning and Data Science knowledge.

What if you could train, evaluate, and deploy state-of-the-art NER models and easily integrate them with your application with just a few clicks, in any language that you like or wish to support.

Now that is why you need AutoNLP. Yes, it is that simple.

Who provides AutoNLP Entity Extraction Services?

We evaluate three AutoNLP Entity Extraction Service providers: NeuralSpace AutoNLP, Hugging Face AutoNLP and AWS Comprehend. Since our goal is to assess these platforms comprehensively in a multilingual setting, let us compare them in terms of supported languages.

NeuralSpace

Number of supported languages: 87 languages

  • Low-resource languages including from the Indian subcontinent, Middle East Asia and Africa.

List of supported languages

Hugging Face

Number of supported languages: 16 languages

  • All high-resource languages

List of supported languages

AWS Comprehend

Number of supported languages: 6 languages

  • All high-resource languages

List of supported languages

Let the comparison battle begin!

In this comparison, we use 20 publicly available datasets and test them on all three AutoNLP Entity Extraction platforms given they support the language of the dataset. To maintain a fair comparison, we present two result tables. The first summarises the results for high-resource languages that are supported by all three platforms, while the second demonstrates results for low-resource language datasets.

Here are the datasets that we have used for this comparison, all of them are available on Hugging Face.

Swedish

Finnish

Romanian

Chinese

Spanish

Dutch

Italian

Bengali

Swedish

Finnish

Hindi

Dutch

Arabic

German

French

Portuguese

Persian

Yoruba

Korean

Afrikaans

Polish

Turkish

Which metric is used to evaluate the NER platforms?

NER models are traditionally evaluated with classification metrics like precision, recall and F1 score. Developers and researchers choose to use either the macro or micro-averaging methods based on whether there is a need to weigh each instance or prediction equally or with regard to the most frequent class labels. These metrics are indeed useful to tune a NER system.

However, this may not be the best approach, as a named entity can be made up of multiple tokens, so a full-entity accuracy would be desirable. Also, this simple schema of calculating an F1-score ignores the possibility of partial matches or other scenarios when the NER system gets the named-entity surface string correct but the type wrong. We might also want to evaluate these scenarios again at a full-entity level.

For this reason, we have used two unique metrics to report performance in this comparison, strict F1 score and partial F1 score.

For more information on why these metrics are more comprehensive than macro and micro-averaging F1 scores, we would like to direct readers to David Batista’s blog post.

Since Hugging Face’s AutoNLP and AWS Comprehend did not report these metrics, nor did they clearly specify the precise method how their F1 scores are computed, we inferenced the respective test datasets and used the nervaluate python library on the predictions, to generate the strict F1 score and partial F1 score results that we report in the following section.

Benchmarking Results

As mentioned above, we divide the results into two tables for a fair comparison. Table 1 demonstrates the benchmarking for high-resource languages such as French, German, Italian, Spanish, Hindi, Bengali, Arabic, Chinese, Finnish, Dutch and Portuguese. Although both NeuralSpace’s AutoNLP and Hugging Face’s AutoNLP support these languages, AWS Comprehend, on the other hand, only supports six languages and one cannot train large datasets with more than 25 entities in total.

Overall, it can be seen that with much less market experience than its competitors and a brand-new solution, NeuralSpace’s AutoNLP achieved comparable results against other established providers across all tested languages.

With the fact that NeuralSpace’s AutoNLP supports 87 languages many more than Hugging Face and AWS Comprehend, it is interesting to see how companies can use this language-agnostic yet powerful approach to scale their pipelines.

Comparison on high-resource languages

Comparison on low-resource languages

User scenarios and associated pricing plans

Let us take a specific scenario where developers from a mid-sized chatbot company would like to use AutoNLP entity extraction to specifically facilitate the intelligence of their chatbots in targeted domains like insurance and healthcare in a multilingual region. NER capabilities are one of the most important modules in building such conversational bots and dialogue systems.

Let us assume the following. The developers at the chatbot company:

  1. Want a throughput of 10 requests per second.
  2. Will make 500,000 API calls to parse user messages per month.
  3. Will train their AutoNLP multilingual NER using 5 training jobs.

We divide the pricing into three sub-parts, namely:

i) training costs

ii) deployment costs

iii) Inference costs using APIs

i) Training costs:

In terms of training costs, NeuralSpace’s AutoNLP is currently priced at a fixed rate of $3 per training job while AWS Comprehend charges $3 per hour of training. Hugging Face AutoNLP doesn’t give a breakdown of the training cost, and only gives an estimate of the total cost according to the size of the dataset and the number of training jobs.

ii) Deployment costs:

Don’t get surprised with deployment costs using AWS Comprehend. Let us explain.

Since our chosen company wants a throughput of 10 requests per second, and NeuralSpace’s AutoNLP promises a throughput of 5 requests per second for each deployed replica, we needed to deploy 2 replicas of the model on their platform. One replica denotes 1 instance of the model. Deploying 1 replica costs $0.5 per day, making the total cost to be 0.5 X 30 (number of days) X 2 (number of replicas).

We require a throughput of 10 requests per second and we can assume that every request will have 90 characters on average. Thus our required throughput is 10 requests per second or 900 characters per second.

AWS Comprehend charges $0.0005 per second for 1 Inference Unit (IU) and 1 IU promises a throughput of 100 characters per second. You can choose any amount of IUs in the range of 1–10,000 depending upon the required demand. In our scenario here, we require a throughput of 900 characters per second. Consequently, we deployed 9 IUs of the model which cost us $0.0005 (cost of 1 IU per second) X 9 (number of IUs deployed) X 2,629,746 (number of seconds in a month)

Hugging Face’s AutoNLP charges $1 per day for “pinning” the model, which essentially deploys the model for use. However, they do not provide any information regarding the throughput of the deployed model, and thus we only deployed 1 replica. Deployment costs are $1 (deployment cost per day) X 30 (number of days) on their platform.

iii) Inference costs:

Since our scenario company estimates to generate 500,000 API calls to parse user messages in one month, and NeuralSpace’s AutoNLP charges $0.007 per request, the total cost on their platform would be 500,000 (number of API calls) X $0.007, which is $3500 for one month.

As discussed before, a single request can be assumed to have 90 characters on average. Thus, in total, the user would need to parse 500,000 (total API calls in a month) X 90 (average characters per API call) characters, which is 45 million characters.

On Hugging Face’s AutoNLP, parsing the first 1 million characters is free, and the user needs to pay $10 per million characters after that. This makes the total inference cost to be 44M (total characters to be parsed after the free limit) X $10, which is $440 for one month.

On the other hand, AWS comprehend charges $0.00001 for each unit to be parsed, where a single unit consists of 100 characters. Thus, to parse 450,000 units, the total cost is 450,000 X $0.0001 which is $45 for one month. However, an important thing to note is that AWS charges for at least 3 units for every parse request sent even if 1 unit is sent for inference actually. Thus, in real-life scenarios, inference costs would most likely be a little bit higher than our above estimate.

Overall user experience

While using all three platforms, we felt that the NeuralSpace user interface and CLI were the easiest to use thanks to their well-presented documentation, user tutorials and explainer videos on YouTube. The seamless pricing calculator provided by NeuralSpace was also very convenient to use and instant updates helped the users be more aware and take note of their current costs. NeuralSpace will very soon integrate a data bulk upload feature that will allow users to instantly add their datasets, similar to what is already possible in Hugging Face AutoNLP. The AWS Comprehend upload dataset feature was a little bit different and no converter scripts were available as compared to NeuralSpace.

We hope that this blog has provided you with insights and will help you to choose the best language-agnostic AutoNLP entity extraction engine for your specific use case.

At NeuralSpace, we will be happy to connect with you if you would like a demo, have any questions or feedback to help us improve the same. We aim to provide a powerful resource to accelerate your pipelines and empower the next billion users to use the internet in the language and mode of their choice. Together we can contribute to the engineering and research community.

The NeuralSpace Platform is live, test and try it out by yourself! Early sign-ups get $500 worth of credits — what are you waiting for?

Join the NeuralSpace Slack Community to connect with us. Also, receive updates and discuss topics in NLP for low-resource languages with fellow developers and researchers.

Check out our Documentation to read more about the NeuralSpace Platform and its different Apps.

--

--