Natural Language Understanding Benchmarks — part 2

Published in

Melior.AI

2 min readSep 20, 2018

In the previous post, we shared some benchmarks comparing our recently developed NLU technology against RASA and Snips alternatives. Obviously, building conversational technology takes more than just a good NLU intent-detection system, but it is such an integral part of it that we decided to rank it against even more alternatives.

Like before, we used a collection of three open corpora suited to the evaluation of conversation interfaces. As per training and benchmarking procedures, we replicated the methodology in Evaluating Natural Language Understanding Services for Conversational Question Answering Systems, and the results for the commercial NLU alternatives are extracted from the same study.

Here are the results for the three data-sets and all the different services¹:

We are delighted to see our NLU score above all competitors in all metrics in two out of three data-sets!

More in-depth analysis is to come, but for now this quick performance overview for intent-detection is very promising!

This is using our raw models without any improvement technique. We are looking forward to challenging ourselves and to see how much we can boost the performance of these models using a whole range of machine learning optimization tricks.

We also have more data-sets for intent-detection and for slot-filling. (Don’t know what slot-filling is? You should definitely check out ‘part 3’ of our NLU benchmarks series! Watch this space, it’s coming soon!)

[1] LUIS, API.ai and Watson metrics were obtained in 2017 by Braun et al, published in Evaluating Natural Language Understanding Services for Conversational Question Answering Systems as part of SIGDIAL 2017 proceedings.

Natural Language Understanding Benchmarks — part 2

Written by Jose Marcos