Generate Chatbot training data with QBox — powered by Microsoft Turing NLG

Benoit Alvarez
Jun 9 · 5 min read

Primary chatbot challenges

One of the primary challenges when building any kind of chatbot is producing or obtaining high-quality, diversified training data. The training data that you use across your model’s intents will determine how readily your model picks up on a real user’s true intent when exposed to queries it’s never seen before. So no matter what chatbot framework you’re using (e.g. Microsoft LUIS, IBM Watson, etc.), having high-quality training data is a must. And it’s not just at the start of chatbot building that this makes a difference; for fine-tuning your model or re-evaluating your intent composition in more mature or larger chatbots, new and varied training data is always useful.

But coming up with strong training data is a common bottleneck for chatbot builders: it requires time and an understanding of how chatbots work. And sometimes chatbot builders get too close to the subject matter, and find it difficult to take a step back and think of more diverse training data.

This is where QBox’s Suggest feature comes in. All a user has to do is ask QBox for new utterances to be suggested to them across whichever intent(s) they want more data for. From these suggestions, the user can choose which, if any, to add to their model.

Turing and QBox secret sauce

Behind the scenes, QBox scans your intents’ training data before integrating with Microsoft’s Turing transformer model. This 17-billion-parameter model has powerful natural-language generation competences that can produce novel utterances or training data. Within QBox, any novel utterances generated as suggestions to enhance an intent will preserve the integrity of that intent by staying within its semantic theme. Not only is the power of the Microsoft Turing model harnessed for these natural-language generations; QBox runs additional NLP processes to parse Turing’s output and present the user with only the most stable and useful utterance suggestions.

Real example

Let’s run through what this whole process looks like in QBox. Firstly, suppose you have an intent dealing with event accommodation and you’ve already come up with some training data/utterances for the intent. In the screenshot below, we can see on the left-hand side that QBox is not currently scoring this well: the primary QBox metrics of correctness, confidence, and clarity are all quite low.

The QBox metrics are simple but very useful for understanding your chatbot’s performance. Correctness is the rate at which intents are being classified correctly. Confidence is how certain the model is that it has made the correct prediction. Clarity is the distinction between intents; lower scores mean your intents are more likely to be confused by novel queries because they’re too close to one another.

From this general overview, we can use the Experiment feature for the eventaccommodation intent. This feature has two functions; Suggest and Quick Analysis. If we switch over to the Suggest tab at the top of the screen and request suggestions, here’s what the initial output might look like.

All the utterances you see below are suggestions generated by Microsoft Turing NLG:

But wait — What if we haphazardly add the Turing-generated, QBox-parsed suggestions to our model and confuse it even further? Don’t worry: QBox has in-built functionality that handles this problem immediately. Quick Analysis allows you to test changes within QBox before pushing them to your NLP provider. Instead of fully committing to an updated model from the get-go, we can select the suggestions we’re interested in and run them through QBox’s Quick Analysis. Having a selective human in the loop between when Turing generates suggestions and when said suggestions are incorporated into the model is also important for filtering out potential bias and maintaining responsible use of AI. Let’s select just two suggestions that we’re interested in, as shown below:

We can then perform Quick Analysis on the selected suggestions. This allows us to see what our scores (correctness, confidence, and clarity) would look like if we were to commit these suggestions to our model:

You can repeat these steps and continue using the Suggest and Quick Analysis features in tandem as much as you like. If you feel that the suggestions are too homogenous or you’d like more diversity in your training data, we have an I’m feeling adventurous checkbox in Suggest that will prompt greater variance from Turing’s natural-language generations.

QBox and Microsoft Turing NLG in action

Check out the video of QBox and Microsoft Turing NLG in action


The Microsoft Turing NLG model allows for powerful, automated natural-language generation. Combined with our QBox technology, it works very well within the context of chatbot building, which streamlines the automation process and provides meaningful metrics to measure its success.

This combination of automation and analysis provides great utility for users at all levels, from those at the early stages of chatbot building all the way to those working with the largest and most mature models — with ease of use remaining a fundamental priority.

QBox - Supercharge your chatbot’s intelligence

More intelligent chatbots need better NLP training data