NeuralSpace’s Building Blocks for NLP in low-resource languages

Felix Laumann
NeuralSpace
Published in
5 min readFeb 1, 2022

NeuralSpace brings you a no-code NLP Platform for low-resource languages spoken in India, the Middle East, Asia and Africa.

The NeuralSpace Platform comes with various language processing Apps that can help you recognize intents and entities in text, classify long or short text into categories, identify speakers in a given audio file, transcribe speech into text, and much more. The AI models that power the platform are state-of-the-art, quality assured and ready for you to customize or consume out-of-the-box.

If you find NLP concepts like BERT, Lemmatization, or Tokenization too complicated, or you are struggling with training, deploying, and scaling your NLP models, you are at the right place.

The NeuralSpace platform is powered by:

  • Language-agnostic AutoNLP, that can train state-of-the-art AI models for your unique datasets with just a click of a button so that you never have to worry about any complex machine learning models ever. And,
  • AutoMLOps, which lets you deploy and scale the models you have trained on the platform to process a nearly infinite number of requests in linear time, without having to ever worry about deployment infrastructure or DevOps.

The NeuralSpace Platform currently consists of five Apps: Language Understanding, Machine Translation, Transliteration, Augmentation & Language Detection. These Apps can be easily combined to build end-to-end text processing systems. Over the next few months, the team at NeuralSpace will add many more Apps with a focus on speech recognition.

We will describe each of these Apps in the following paragraphs one by one.

1. Language Understanding

Language Understanding’s main purpose is to understand the intent of the user, and extract relevant information (entities) from what they said (speech) or wrote (text) to perform a relevant action. Entities can be anything from names, addresses, account numbers to very domain-specific terms like names of chemicals, medicines, etc. Sometimes it also predicts the user's sentiment, which helps the bot respond to the user in a more empathetic tone.

The App is commonly used in automatic chat or voice assistants and, once the intent of the user is understood, sends commands to perform a relevant action. It allows developers to build voice assistants with unique possibilities for customer service teams. Various projects can easily be created in the NeuralSpace Platform from automatically providing information about tracking a parcel (for a shipping company), to finding the relevant policies in case of a car accident (for an insurance company), to ordering dinner in a food delivery app.

Features:

  • Train with AutoNLP: Using our NLU APIs you can train your own AI models to predict intents as well as extract entities.
  • Language Support: Over 87 languages are supported including 21 Arabic dialects, 11 Indian languages, and various others spoken across Africa and Asia.
  • Low Data Requirements: Our models are extremely data efficient. You can start training your models with just 10 examples per intent.
  • Accelerate Dataset Creation with our Data Studio: Equipped with handy utility tools like entity marker, our DataStudio is an in-browser text editor for creating datasets.
  • Easy to Integrate and Scale: Scale or replicate your deployed models for higher availability and throughput and integrate them with your application through REST APIs.

You can train, deploy and use your first NLU model in under 5 minutes! Check out our tutorials here.

2. Machine Translation

Whether we are talking about subtitles, government documents or question papers for exams, all of them need to be translated into multiple languages. Manually translating documents at such a scale is not only expensive but also an extremely time-consuming process.

With the help of Machine Translation, you can drastically reduce the amount of time it takes for manual translation of documents.

Features:

  • State-of-the-art Models: Use our pre-trained state-of-the-art translation models through APIs and integrate them into any application.
  • Language Support: 108 languages supported (Any to Any)
  • Train with AutoNLP (coming soon): Train your own use case specific translation models using AutoNLP.

3. Transliteration

For languages that don’t use the Latin script, e.g., Arabic, Hindi, Punjabi, Sinhala and many others spoken around the world, typing can be challenging as keyboards/keypads mostly default to Latin characters. That makes creating content in vernacular languages difficult. With NeuralSpace’s Transliteration App, you can create content in these languages using your Latin keypad. It also enables developers to match international databases (of products, for example) to customer queries in local languages that do not use the Latin script. For instance, when a customer in Sri Lanka searches for a new pair of Nike shoes through Amazon’s chat assistant, she will type නයික් සපත්තු and NeuralSpace’s Transliteration would scan Amazon’s English-written database for Nike shoes.

Features:

  • Off-the-shelf Models: Use our pre-trained production-grade models through APIs and integrate them into any application.
  • Language Support: Over 20 language pairs are supported including Arabic and 11 Indian languages.
  • Train Your Own with AutoNLP (coming soon): Using AutoNLP improve/customize existing transliteration models with your own data.
  • Accelerate Dataset Creation with our Data Studio (Coming Soon): Equipped with handy utility tools, our Data Studio is an in-browser text editor for creating datasets.
  • Easy to Integrate and Scale (Coming Soon): Scale or replicate your deployed models for higher availability and throughput and integrate them with your application through REST APIs.

4. Augmentation

Any language processing task requires data, and we all wish we could generate data magically. Given a sentence Augmentation can generate up to ten sentences keeping the intent of the original sentence intact. It can help in creating datasets faster and make language processing models more robust.

Features:

  • State-of-the-art Models: Use our state-of-the-art augmentation models through APIs and integrate them into any application.
  • Easy to Use: Simply pass your text through the API, and get up to 10 augmented sentences.
  • Language Support: Over 100 languages supported.

5. Language Detection

Often language detection is a prerequisite for any NLP products and solutions. It helps you improve user experience as well as pick language-specific NLP models to process what they are saying. For example, a voice assistant can detect the language the user is speaking in and respond to the user in the same language; or an email automation agent can detect the language of the sender and accordingly pick a language-specific NLP model to process the rest of the text to classify whether the email is spam or not.

Features:

  • State-of-the-art Models: Use our pre-trained state-of-the-art language detection models through APIs and integrate them into any application.
  • Easy to Use: Simply pass the text through the API, and get top N predicted languages along with confidence scores.
  • Language Support: Over 150 languages supported.

We understand that NLP for low-resource languages is extremely challenging, especially for the ones spoken in India, the Middle East, Asia and Africa, which is why the NeuralSpace Platform focuses on multilinguality, and provides a no-code way to create data, train, deploy and test models, and connect these models with any other application you want.

Check out our Documentation for all the Apps and features of the NeuralSpace Platform.

Join the NeuralSpace Slack Community to receive updates and discuss NLP for low-resource languages with fellow developers.

Read more about the NeuralSpace Platform on neuralspace.ai.

--

--