Entity Recognition with NeuralSpace in 80+ Languages

Felix Laumann
NeuralSpace
Published in
5 min readApr 7, 2022

What is Entity Recognition?

Named Entity Recognition (NER) is a core component in many NLP and Information Retrieval (IR) applications including but not limited to question answering, summarization and machine translation. Overall, it plays an essential role in language understanding. To perform an action on a certain user query you not only need to understand the intent behind it but also need to extract and classify certain occurrences in a piece of text into pre-defined categories.

What are these categories?

The categories can be thought of as the type of entities a NER model can extract. For example, it can be a name (of an organization, a person, a place…), an address, account numbers, measurement parameters, percentages, and even domain-specific terms like names of chemicals, medicines, etc. Through this method, essentially any valuable information can be extracted from text.

Let us take an example:

If someone says “flights from Berlin to London”, the intent here is flight-search and entities are Berlin and London, which are of type city.

These entities can also be looked at at a more granular level. Berlin can be from-city and London can be to-city.

A domain-specific example could be,

“I need 8 paracetamol tablets”, where 8 is a number, paracetamol is medicine-component, and tablets is medicine-form.

Features of NeuralSpace’s Entity Recognition Models:

  • Off-the-shelf Models: Use our pre-trained production-ready models through APIs and integrate them into any application. Each language has a different set of pre-trained entities. Check them out in our Docs
  • Language Support: 80+ languages supported
  • Entity Basket: 36 different entities can be extracted using our pre-trained models.
  • Train with AutoNLP (coming soon): Train your own NER models to extract custom entities using AutoNLP
  • Accelerate Dataset Creation with our DataStudio (common soon): Equipped with handy utility tools like tagging entities by selecting words, our Data Studio is an in-browser text editor and annotator for creating datasets

AI Modeling Life Cycle

Just like other Apps on the NeuralSpace Platform, the Entity Extraction App will take care of the entire AI modeling lifecycle, which is

  • Dataset preparation
  • Model training
  • Model deployment
  • Model feedback

Let us go through these steps one by one. You can upload your existing datasets through the fresh Import Datasets feature, or create your own dataset in the Data Studio — NeuralSpace’s data preparation and annotation tool which is designed to make dataset creation and modification much faster. Users can, for example, tag specific entities within a sentence and have that phrase or word also tagged in the translated sentence automatically when the dataset is required in more than one language. The Data Studio is designed to make dataset creation much simpler and faster.

Training a custom NER model using AutoNLP is as easy as clicking on the Train with AutoNLP button once your dataset is uploaded and prepared in the Data Studio. After your training is completed after a couple of minutes, you can place your model in production. NeuralSpace’s in-house developed AutoMLOps feature allows you to use your custom-trained models with throughput rates of up to 30 requests per second. Just click on the Deploy button next to the trained model that achieved the best performance and let AutoMLOps handle the rest for you.

Once deployed, you can test your models using our interactive model testing and feedback mechanism, by clicking on the Test model and Feedback page, respectively. The Feedback page lets you browse through everything that has passed through your models and you can directly add sentences that were translated incorrectly back to your dataset. This will start a feedback-driven learning cycle and you should retrain your models to keep them up to date. We recommend doing this once a week in the first two months of your live model but later once a month.

Use-Cases

NeuralSpace’s multilingual NER models are industry-agnostic with a wide range of applications. Below are some use-cases:

#1 Powering Content Recommendations & Efficient Search Algorithms

Recommendation systems dominate how we discover new content and ideas in today’s world. News publishers, for example, use NER by extracting entities from a particular article and recommending the other articles which have similar entities mentioned in them. Overall, this approach is effectively used to develop content recommendations for various media outlets.

#2 Customer Support

There are a number of ways to make the process of customer feedback handling smooth and NER is one of them. One use-case is using the extracted entities to categorize the enquiry and assign it to the relevant department within the organization handling this.

#3 Machine Translation Systems

When it comes to machine translation (especially for lower resource languages) named entities prove especially tricky because their translation is based on language-specific rules. If the named entities are extracted before the actual translation the entire process becomes much more accurate.

#4 Efficient Semantic Annotation

Semantic annotation is the process of adding extra information to a document about concepts relevant to it. Named entities can help machines understand the nuances of a textual document better by providing this extra information.

Language Support

Basque (eu)

Belarusian (be)

Catalan (ca)

Croatian (hr)

Czech (cs)

Estonian (et)

Galician (gl)

Hungarian (hu)

Irish (ga)

Latin (la)

Latvian (lv)

Serbian (sr)

Slovak (sk)

Slovenian (sl)

Bulgarian (bg)

Armenian (hy)

Turkish (tr)

Ukrainian (uk)

Hebrew (he)

Kazakh (kk)

Maltese (mt)

Uighur (ug)

Finnish (fi)

Swedish (sv)

Indonesian (id)

Korean (ko)

Vietnamese (vi)

Afrikaans (af)

Hindi (hi)

Bengali (bn)

Telugu (te)

Tamil (ta)

Marathi (mr)

Urdu (ur)

Gujarati (gu)

Kannada (kn)

Malayalam (ml)

Assamese (as)

Punjabi (pa)

Persian (fa)

Arabic (ar)

Arabic (Egyptian) (arz)

Arabic (Levantine) (apc)

Arabic (Maghrebi) (ama)

Arabic (Mesopotamian) (acm)

Arabic (Kuwaiti) (akw)

Arabic (Sudanese) (apd)

Arabic (Gulf) (afb)

Greek (el)

Danish (da)

English (en)

Norwegian Bokmål (nb)

Chinese (zh)

Dutch (nl)

French (fr)

German (de)

Italian (it)

Japanese (ja)

Lithuanian (lt)

Polish (pl)

Portuguese (pt)

Romanian (ro)

Russian (ru)

Spanish (es)

Albanian (sq)

Aragonese (an)

Azerbaijani (az)

Bashkir (ba)

Bosnian (bs)

Breton (br)

Burmese (my)

Chechen (ce)

Chuvash (cv)

Georgian (ka)

Haitian (ht)

Icelandic (is)

Ido (io)

Javanese (jv)

Kirghiz (ky)

Luxembourgish (lb)

Macedonian (mk)

Malagasy (mg)

Malay (ms)

Nepali (ne)

Occitan (oc)

Sundanese (su)

Swahili (sw)

Tagalog (tl)

Tajik (tg)

Tatar (tt)

Uzbek (uz)

Volapük (vo)

Welsh (cy)

Yoruba (yo)

Multilingual/Code-Mixed (multilingual)

Different languages support a different set of entities. Check them out here.

Check out our Getting Started guide to learn how to use NeuralSpace’s Entity Recognition.

The NeuralSpace Platform is live, test and try it out by yourself! Early sign-ups get $500 worth of credits — what are you waiting for?

Join the NeuralSpace Slack Community to connect with us, ask questions and collaborate on exciting projects with other community members. Also, receive updates and discuss topics in NLP for low-resource languages with fellow developers and researchers.

Check out our Documentation to read more about the NeuralSpace Platform and its different Apps.

Happy NLP!

--

--