Understanding words to understand language

SAP Conversational AI
Chatbots Developers
4 min readApr 26, 2016

At Recast.AI, we use Natural Language Processing (NLP) as a way to enrich input from users, and context is an important part of the process. When we handle a user’s request, we decide whether or not it matches an intent, which is the general meaning of a sentence. In order to do that, we need to understand what the user is saying by analyzing the context and the meanings of each word.

One day, I came across a sentence our software couldn’t categorize correctly. It was a simple sentence, only 7 words:

“The workers at the plant were overworked.”

Our current program wasn’t capable of understanding the meaning of plant, failing over and over to understand it as an industrial facility.

After a few searches, we found lots of papers on the problem, and from here we began to explore one of the oldest unresolved issues of NLP.

Word Sense Disambiguation

The task of Word Sense Disambiguation (WSD) consists of selecting the best sense for an occurrence of a word in a given context. In our case, we had to find the correct sense for the word plant.

This issue was discovered at the creation of Machine Translation (roughly in the 40s), and is still unresolved now.

Researchers tried various approaches, from rule-based to probabilistic systems, through fine grained knowledge base and automated knowledge extraction. All these efforts weren’t enough to solve this problem.

To give you an idea, let me show you a list of algorithms and Machine Learning techniques used in WSD:

A sample list of WSD algorithms and Machine Learning techniques

Every solution has its pros and cons, but the state-of-the-art in theSupervised Machine Learning field is achieving more than 90% accuracy.

Our goal was to find the best compromise between simplicity and efficiency as our first implementation.

Flora, Stratagem or Building?

The underlying problem resides in the fact that every word can have really different meanings. Finding and selecting the right meaning for the right context is decisive for the comprehension of natural language.

Here come WordNet, a lexical database for English. It groups words into synonym sets, providing short definitions and examples, and records the relationships between those groups.

The differents meanings of plant in WordNet

WordNet allows us to retrieve synonyms, antonyms, and different forms ot a word. For each of these, we get a definition.

Once we have collected all the definitions of each word in the sentence, we use the Lesk algorithm to compare their similarity.

The Lesk algorithm (Michael E. Lesk, 1986) is based on the assumption that words in a sentence will share a common topic. The idea is simple: We take all the possible definitions of each word in a sentence and select the definitions which overlap the most.

We can represent this in a simple diagram, see below:

A representation of the Lesk algorithm

What’s really interesting with this diagram, is that it shows the similarities between the Lesk algorithm and the way our brain works.

By using a knowledge base and this crafted “brain”, we can now select the best sense for the word plant in the context of our sentence, allowing us to generate its synonyms, antonyms and so on.

Word Sense Disambiguation is used to know the sense of a word thanks to a given context. And it allows us to perform automatic translation, and even the the first stages of language generation.

Today, bots and AIs have proven they’re on the way to understanding language, but the next step — and most important one — is for them to be able to answer us by themselves!

Paul RENVOISE — Recast.AI

This post was originally published on our blog. If you enjoyed it, you might also like: UI and AI: the next interface is conversation

--

--

Chatbots Developers
Chatbots Developers

Published in Chatbots Developers

This blog is for chatbots developers. Writing about global developers news in chatbots, messengers api trends, AI and NLP technologies, cases and plugins (Facebook Messenger, Slack, Telegram, Kik, WeChat, Line, WhatsApp and etc.)

SAP Conversational AI
SAP Conversational AI

Written by SAP Conversational AI

Bot building software for the enterprise. Formerly known as Recast.AI, startup acquired by @SAP in Jan 2018 to transform customer experience with #bots and #AI