Deep Dive: NLP optimization for Chatbots

5 min readAug 21, 2019

As mentioned in my previous article, any conversational solution has two components; NLP, which is the core or the brain and Conversational Model, which is how your bot interacts with users.

Today we’re going to see how we can optimize NLP configuration to get better results. Before we go further, we will look at few important terms:

Sweet Spot: — In terms of ML, a sweet spot is a particular point in training where the ML model is at peak accuracy. We can draw a parallel with Chatbots here; Sweet spot refers to the particular number of training phrases or utterances per intent so that bot is at maximum accuracy.
Under-training: — When the training data supplied to Chatbot is insufficient, then it results in under-training. It can lead chatbot to give inaccurate results, particularly hitting fallback more often.
Over-training: — Over-training occurs when more than required data is supplied, or there is huge difference between training data supplied to intents.

Now we know some basic terms so we’ll take a deep dive into NLP Optimization.

Number of variations for an intent — This varies from framework to framework used for building bots and also is dependent on type of bot getting created.
E.g. — If you are creating a bot on Dialogflow then you might want to have 20+ variations per intent in case you’re using Hybrid Mode or 50+ variations per intent in case you’re using ML Only Mode.
Note: This is just the minimum number and in no way should be treated as optimal numbers.
Rule based bot or Only ML — This is one of the common questions which you’ll run into when building a bot. Whether it should follow rule based approach or depend solely on ML is not something which can be answered in a straightforward way. It’s advised that you start with a rule based approach; giving your application time to gather some data and then analyze that data to switch to ML mode. However keep in mind that once you switch to ML only mode you’ll need an extensive training data set to train your bot.
Quality and Quantity of training data — What plays an important role in your NLP optimization is the training data. Let’s look at the training data in terms of both Quality and Quantity
— Quantity — The quantity required to train the bot is dependent upon the number of scenarios your bot is handling and how many scenarios of these have a conversational input. E.g. — If for a scenario bot accepts only numbers then you don’t need to have an extensive data set to train it; “Please tell me number of pizzas you’d like to order”; In this example bot is expecting a numerical input with or without a statement, hence here you can just train bot to accept all inputs and then derive out a numerical value via an entity or parameter.
However for scenarios which requires a conversational input, a huge training data set would be needed. E.g. — If bot asks “What can I do for you”, and say there are three scenarios where bot can help, make tea, order pizza, and play music, then all these three scenarios must have a big data set to be trained upon as user can ask for any of these option in n number of ways.
— Quality — The quality of training data is another extremely important factor in bot performance. Quality of training data means in how many different ways, a question can be asked by user. If there are 3 such ways, then all three must be covered by the data set. Another important point to notice here is the difference in number of records per label (scenario). The difference in number of records per scenario must not be greater than 30%. E.g. — For this case, where we have three scenarios A, B and C, the records should look something like 100, 72 and 80 records respectively. As you can see data set in this case would be having a size of 252 records. Now for scenario A, the records are 40%, B has 31% and C has 29% (approx.) of total records. The difference between any of them isn’t greater than 30%, hence this is a quality data set.
Classification threshold — Classification threshold or ML classification threshold or confidence score is one very important parameter while building your bot. To summarize, generally 0.7 (or 70% confidence) is considered an ideal threshold but this is not always true and vary based on case and implementation.
Annotations — Annotations play a critical role in training any Chatbot. If you’re using any chatbot framework, then there are high chances that there is a cap on how much of training phrases or user utterances you can supply to an intent. That said, this number may or may not suffice your need or your training data. Say, if you have 5000 different samples for an intent (training phrases for an intent) and you’re using Dialogflow, then you’d never be able to put all 5000 training phrases directly, as Dialogflow caps total training phrases that can be supplied to an individual intent as 2000.
So how to overcome this limitation? Well that’s where you use annotations.
E.g. — “I want a banana”. Consider this is your training phrase and you have multiple training phrases like this with different fruits, like mango, apple etc. One way to train bot is to put all these training phrases as it is in the bot, which even though works fine would soon end up consuming your limit of training phrases. Another way is to annotate, i.e. you put a training phrase in the intent — “I want a fruit” where fruit is an entity and can contain all the synonyms and fruits that you want this particular intent to capture. This way you can reduce the number of training phrases passed to an intent and still would be able to capture all the relevant/relative variations.
You can even experiment with Full Annotation Model where you can annotate each and every phrase/word in the training phrase to effectively come up with millions of variations for just one question and one training phrase.
Use of suggestive queries is also recommended when you’re trying to optimize NLP. To elaborate, suggestive queries are those scenarios where users don’t tell the exact reason or express themselves clearly.
E.g. — “I want to change my address on the account” is a straight-forward query example where user is expressing the intent clearly, however if user just say, “I have relocated” then this query is would translate differently for different use-cases. However if you look at former query, then latter would give you a sense that user might be inclined to change their address on the account.
The latter query in this example is what is known as suggestive statements/queries and some examples would be needed to capture similar situations.

These are some commonly used practices used to optimize NLP configuration of chatbots, however do note that this list is not exhaustive. There are lots of other techniques as well being used today which hasn’t been discussed here

Deep Dive: NLP optimization for Chatbots

Written by Shashikant Jha