Evaluating NeuralSpace on the Amazon MASSIVE Dataset

Felix Laumann
NeuralSpace
Published in
3 min readApr 27, 2022

A million examples in 51 languages — the Amazon MASSIVE dataset.

With their open-source code and publicly released models, Amazon Science recently published a promising dataset aligned to their vision of multilingual Natural Language Understanding (NLU). The researchers also provide examples to perform multilingual NLU modelling with strong baseline results that motivate practitioners to not only recreate results but also push the state-of-the-art for intent classification and entity recognition (or slot filling),

The Amazon MASSIVE dataset is meant to enable models to learn shared representations of utterances with the same intents, regardless of language, facilitating cross-linguistic training on NLU tasks. It also allows for adaptation to other NLP tasks such as machine translation, multilingual paraphrasing, new linguistic analyses of imperative morphologies, and more.

We at NeuralSpace, took this dataset and trained our proprietary Language Understanding models with AutoNLP using the no-code NeuralSpace Platform.

Below are the results. Bear in mind they have been generated without knowing anything about deep learning, recurrent neural networks, let alone transformers. The language-specific datasets for each of the 51 languages are now readily available to NeuralSpace users and can be imported with a click of a button in the new ‘Import tab’.

Results:

Since our focus at NeuralSpace has always been on locally spoken low-resource languages, we also calculated our average results separately for Indian languages (Kannada, Telugu, Urdu, Malayalam, Tamil, Bengali & Hindi), Arabic and Swahili. For the Indian languages, we achieve an average Intent Accuracy of 77% and Strict and Partial F1 Scores of 89% and 92% respectively. In Arabic our results are even better with an Intent Accuracy of 85% and Strict and Partial F1 Scores of 92 and 95%, respectively. In Swahili, we also achieve high results with an Intent Accuracy of 87%, a Strict F1 Score of 94% and a Partial F1 Score of 96%. The results are summarized below.

If you haven’t yet, sign-up on the NeuralSpace Platform to try and test it out by yourself! Early sign-ups get $500 worth of credits — what are you waiting for?

Join the NeuralSpace Slack Community to connect with us. Also, receive updates and discuss topics in NLP for low-resource languages with fellow developers and researchers.

Check out our Documentation to read more about the NeuralSpace Platform and its different Apps.

Happy NLP!

--

--