Creating a Better Benchmark for NLU

John Ball
Pat Inc
Published in
10 min readSep 6, 2019

--

Aiming at the target is the best way to hit it. An NLU benchmark needs to have the same target — i.e. NLU in conversation. Search is NOT language.

An NLU benchmark should progress NLP performance in conversation, making it as accurate as mathematics on computer.

My SuperGLUE benchmark article notes that the consortium doesn’t ask questions in language, or generate answers in language. It is more of a test of search, rather than natural language understanding (NLU), which could explain the observable limitations in conversational AI that is using technology that is improving at the GLUE benchmark.

I was immediately asked what a benchmark for natural language understanding should look like.

The benchmark for natural language processing (NLP), which should be comprised of NLU and natural language generation (NLG), should test language, not knowledge. What’s the difference?

Language allows communications to take place, leveraging shared information in context during conversation. Knowledge — detailed experience on topics — is important in discourse, but these NLU tests should focus on language while also introducing knowledge as a means of extending context. Put another way, we can talk to people with language on topics we know nothing about, and learn through the process.

Getting NLU right enables knowledge to be entered into conversation naturally, but…

--

--

John Ball
Pat Inc

I'm a cognitive scientist working on NLU (Natural Language Understanding) systems based on RRG (Role and Reference Grammar). A mouthful, I know!