Quality Metrics for NLU/Chatbot Training Data / Part 2: Embeddings

Florian Treml
Analytics Vidhya
Published in
4 min readSep 15, 2020


What are Embeddings? What is similarity, cohesion and separation?

UPDATE 2020/11/01: Botium’s free plan is live! With Botium Box Mini you will be able to:

  • use multiple chatbot technologies
  • set up test automation in a few minutes
  • enjoy a new improved user interface
  • get the benefits of a hosted, free service

Take it for a test drive

This article series provides an introduction to important quality metrics for your NLU engine and your chatbot training data. We will focus on practical usage of the introduced metrics, not on the mathematical and statistical background — I will add links to other articles for this purpose.

This is part 2 of the Quality Metrics for NLU/Chatbot Training Data series of articles.

For this article series, you should have an understanding what NLU and NLP is and about the involved vocabulary (intent, entity, utterance) and concepts (intent resolution, intent confidence, user examples).

What are Embeddings ?

Embeddings are a type of word or sentence representation that allows words or sentences with similar meaning to have a similar representation.

While this sounds complex, the concept is easy to understand when looking on this scatter chart and an example:

  • each colored dot represents a word or a sentence
  • the lower the distance between two dots, the more similar the words or sentences are (in this case: from a semantical point of view)
  • the higher the distance, the less similar they are
2D visualization of word embeddings

As an example:

  • “I’d like to order a drink”
  • “I want iced coffee”
  • “not interested”

The first two sentences will be rather close in the Embeddings space, while the third one will appear distant to both of the first two.

Mathematically speaking, an embedding is a vector in an n-dimensional space — the higher n, the more complex concepts can be handled. It is not a trivial task to map natural language into an n-dimensional space while considering semantical similarity. Fortunately, there are ready-to-use models available for the most-spoken languages, for example the Universal Sentence Encoder developed by Google.

An encoder is a neural network that takes the input, and outputs a feature map/vector/tensor — a point in n-dimensional space.

Reducing this n-dimensional vector into a 2D representation to be visualized on a flat scatter chart is a matter of Principal Component Analysis (PCA).

Using Embeddings for Training Data Analysis

When training an NLU engine for chatbots, you typically have labeled training data available — a list of intents each with a couple of training phrases for each intent. Our tool of choice for showing a sample data analysis workflow is Botium Box.

Botium first generates semantic embeddings of the training phrases by using the Universal Sentence Encoder module and visualizes them in a 2D-map. Based on the similarity between the training phrases, the average similarity between the intents is computed (separation), as well as the average similarity of phrases within an intent (cohesion). This approach helps to identify training phrases that might confuse your chatbot — based on the similarity in the embedding space.

Utterance Similarity

Training phrases in different intents that have high similarity value can be confusing to the NLU engine, and could lead to directing the user input to the wrong intent.

Utterance similarity

Intent Separation

Given two intents, the average distance between each pair of training phrases in the two intents is shown.

Intent separation

Intent Cohesion

Cohesion is the average similarity value between each pair of training phrases in the same intent. That value is computed for each intent. The higher the intent cohesion value, the better the intent training phrases.

Intent cohesion

Improve Chatbot Training Phrases

To improve the quality of the training phrases for your intents, consider the following approaches:

  • Find the phrases in different intents with high similarity in the Utterance Similarity table, and change or remove them
  • For intents with low cohesion, add more meaningful training phrases
  • For intent pairs with low separation, investigate training phrases

Give Botium Box a test drive today — start with the free Community Edition, we are happy to hear from you if you find it useful!

Looking for contributors

Please take part in the Botium community to bring chatbots forward! By contributing you help in increasing the quality of chatbots worldwide, leading to increasing end-user acceptance, which again will bring your own chatbot forward! Start here



Florian Treml
Analytics Vidhya

Co-Founder and CTO Botium🤓 — Guitarist 🎸 — 3xFather 🐣