A look at state-of-the-art natural language understanding

Dominik
Cognigy.AI
Published in
4 min readDec 19, 2018
Photo by Franki Chamaki on Unsplash

In this article we want to take a closer look at intent classification. At Cognigy it’s our mission to enable all devices and applications to intelligently communicate with their users via naturally spoken or written dialogue. With Cognigy 3.2 we released Cognigy NLU 2.0, our all new natural language understanding technology.

What is intent classification?

Intent classification is the art of understanding what the user wants.
A customer asks to speak to a human supervisor. He might say something like “I want to talk to the manager”. With Cognigy’s new handover feature we could readily accommodate such a request.

How do we identify this intent in a natural conversation?

When you design a conversation you create a list of intents, each with a number example sentences that represent what a user might say to express his intent. Based on this information the AI learns to classify the intent of any possible input sentence.

How to evaluate intent classification?

A good AI system is able to map new input sentences correctly to an intent even if it has never encountered them before. By creating a test set of unfamiliar sentences it is possible to benchmark and evaluate different chatbot and conversational AI platforms.

In 2017, computer scientists from the Technical University of Munich have published a paper “Evaluating Natural Language Understanding Services for Conversational Question Answering Systems" — it has become the most authoritative benchmark in the field.

What do we evaluate?

The paper presents three data sets or Corpora: Chatbot, Ask Ubuntu and Web Applications. The data is sourced from real-life users and has been manually labeled by the researchers. Each data set is increasingly complicated and harder to classify for the machine, we have:

  1. Chatbot
    This set is based on data from a mobile transportation chatbot. It has two intents:
    — users ask for transport connection from A to B
    — users ask for the arrival or departure time of the next train or bus
    The training example sentence set is relatively large with 100 sentences and highly repetitive.
  2. Ask Ubuntu
    The data set has been scraped from the Stack Overflow Ask Ubuntu Q&A forum. For example, one intent revolves around setting up a printer. The data set has about half the number of example sentences with 53 and a larger number of 5 intents.
  3. Web Applications
    Like Ask Ubuntu this data set is based on Stack Overflow. It is designed to be most challenging, with only 30 example sentences and a total of 8 intents.
Overview of Chatbot, Ask Ubuntu, Web Application data sets

With Cognigy NLU 2.0 we’re pleased to achieve the first near human-level performance also on the most challenging Web Applications data set with a noticeable difference to competing platforms.

F1-Score on Web Applications

Overall, Cognigy NLU 2.0 achieves state-of-the art performance:

F1-Score by data set and overall after micro-averaging

So what?

Progress in intent classification continues to be rapid. At time of publication of “Evaluating Natural Language Understanding Services” in 2017, Microsoft had a distinct advantage besting competitors by more than 20%. This gap has all but closed. We expect the industry to match the new benchmark set here on the most challenging Web Application data set in a similar time-frame.

With state-of-the-art intent classification we have made an advanced machine learning problem a commodity. We are very far, however, from human level understanding and intelligence. If any academics are listening — we definitely need more independent, realistic and challenging research benchmarks to bring the field forward.

What’s next?

If you ask me what the next big thing is, the answer would have to be one- or zero-shot learning. The ability to display learned behavior based on single piece of or no information at all. What if we could simply describe the intent to the machine in one or two succinct sentences?

a users ask for transport connection from A to B

a user asks for information on how to reset his password

The machine does perfectly well to predict such intents armed with 15–30 example sentences. However, an educated human would fully understand the meaning behind these sentences and predict the associated intent with perfect accuracy. This qualitative difference is still huge and it’s staring us right in the face. An embarrassment to any AI practitioner. Well, we’re working on it! With Cognigy NLU 2.0 we took a baby step forward, the gap is closing.

If you liked this post please subscribe to our blog and get in touch for a demo!

Evaluation scripts and Cognigy Flows are available on github.

--

--