NeuralSpace beats Google, IBM and Amazon on NLU intent accuracy

Published in

NeuralSpace

4 min readNov 9, 2022

Introduction

Whether you are using chatbots, voicebots, or process automation engines, they are all powered by Natural Language Understanding (NLU). Its main purpose is to understand the user’s intent, and extract relevant keywords (entities) from what they said or wrote to perform a relevant action.

NeuralSpace’s language support extends to almost 100 languages, including many locally spoken (low-resource) languages spoken across Asia, the Middle East and Africa. Google’s Dialogflow offers support of 96 different languages. Amazon Lex supports 13 languages, while IBM Watson only supports 9 languages.

We benchmarked NeuralSpace’s intent classification accuracy with these three NLU providers using Amazon Science’s MASSIVE dataset for the comparison.

Check out our blog on Evaluating NeuralSpace on the Amazon MASSIVE Dataset to know more about the dataset and see our accuracy on intent classification and F1 scores for entity recognition on all 51 languages that the dataset is available in.

Results

The table below shows the intent accuracies of Google Dialogflow ES, IBM Watson, Amazon Lex and NeuralSpace’s Language Understanding service, on the Amazon MASSIVE dataset.

We compared the following languages: Arabic, English, French, German, Italian, Japanese, Korean, Portuguese, Spanish and Chinese.

As you can see in the table below, NeuralSpace performs a staggering 5% better than Dialogflow, 3% better than IBM Watson and a 2.8% than Amazon Lex, on average across all ten languages. We recorded greater accuracy than each of the three providers on every single language that we tested on, some by as much as 9% (for German) and 7% (for French).

A quick note on how we measured intent accuracy:

It means that we divide the total number of correctly classified sentences by the total number of test examples. So, the higher the accuracy, the better!

How we did it

NeuralSpace has always had the ambition to lead the market of multilingual NLP models (one of the Sample Vendors in 2022 Gartner Hype Cycle for Natural Language Technologies), and makes those easily accessible to software developers through a simple no-code user interface that provides you a unique API key to access the trained models.

It’s no secret that NeuralSpace has built all of its models based on the famous transformer architecture, adapted and enhanced with multiple other models to fit the unique needs of every language. This is a particular advantage when models are trained on languages that generally suffer from small datasets, like almost any locally spoken language in Asia, the Middle East or Africa (low-resource languages).

We have also tried to make it as simple as possible for NLP practitioners to implement any transformer-based model, considering latency and scalability.

To train accurate models specific to any dataset, we came up with our proprietary language-agnostic AutoNLP: an algorithm that figures out which training pipeline, method, features, loss function and other hyperparameters will give the most accurate results on your unique data set. We built our AutoNLP highly data-efficient because creating data sets is itself a challenging task.

AutoNLP performs an automatic selection of the model architecture that includes extracting features from a pre-trained transformer model in the selected language, which has been trained on millions of generic sentences. After these features are extracted they are passed onto a custom, or “head” model that is unique for each user. This custom model is fine-tuned on the examples provided by the user. It enables our full model (pre-trained + head) to get accurately trained on small amounts of data in a short time.

Our AutoNLP also controls for the right training duration so the models are not overfitting on the given, possibly small data sets, and models are not further trained when no significant improvements can be achieved anymore. This saves users unnecessary time when training models. Whenever a model is trained, the NeuralSpace platform allows you to configure how many parallel training jobs users would like to start.

In this concurrent training process, users can train up to five models at the same time while additional examples may be uploaded. With NeuralSpace’s version control system in place, users can easily benchmark multiple versions of the model that have been trained on an evolving dataset. Also, training on NeuralSpace is optimized to use specific machines in order to get faster results.

When it comes to the deployment of transformer-based models, we have developed our proprietary AutoMLOps tool that lets users deploy multiple copies of the same model in parallel to keep latency with any load at no more than 500ms. The AutoMLOps component includes features like a load allocator that automatically distributes any input message to the parallel copy that has the lowest load at that moment.

Feel free to reach out to us or book a call directly if you’d like to talk with our team in more detail.

Or, you can try out NeuralSpace’s Language Understanding service now, for free.

Be sure to check out our Documentation to read more about the NeuralSpace Platform and its different services.

Happy NLP!

NeuralSpace beats Google, IBM and Amazon on NLU intent accuracy

Introduction

Results

How we did it

Written by Felix Laumann