A Guide to Always Improve Your Watson Assistant

As users start to interact with your virtual assistant, you will start seeing the need to add more intents and retrain your assistant by adding more examples and more content to make your assistant better and efficient.

You can easily see patterns on your improve panel on your Watson Assistant tooling and having a premium instance will also uncover some new recommendations and topics that you might’ve missed.

You wouldn’t want to test any changes in your intents on your production environment as it might hurt your existing training model or conflict with any of your intents. It is always advisable to have a development workspace to try those changes on before releasing it to the public.

I have created this Jupyter Notebook to extract all the logs in a given period from your Watson Assistant workspace and export them into an Excel sheet.


In the first section of the notebook you will find the extraction cell which you can use as is or modify it to filter the logs by a given date or criteria.

You can do that by changing the fltr variable and add for example response_timestamp>2018-05-11 to get all the logs after this date. This filter should also be added as a parameter to the conversation.list_logs call.

Check out the Watson Assistant Documentation for different filtering options.


Next cell is where you can format and clean the extracted log to your preference.

I chose 6 fields to extract here to help me analyze and understand the user conversation better and get a full picture of the behavior of the virtual assistant with the user.

After exporting the logs to an Excel sheet, you might be interested to look into the following:

  • Intents identified with low confidence (maybe <60%).
  • Questions which weren’t identified by Watson at all (Maybe they were bad questions or maybe it’s an area that you might want to train add to your workspace).
  • Users asking and repeatedly hitting the same intent multiple times. This might indicate that the user is not getting the response s/he needs while Watson thinks that it got the correct response.
  • Off-Topic questions, an essential part of enhancing your user’s experience is to be able to respond to those Off-Topic questions.

These are the obvious things to look for, but you can also take a random sample from the logs and analyze it to see if there were any training issues or bugs either in the dialog implementation itself or if you have any integration with a back-end systems.

It’s also good to point out that a user’s conversation with your assistant will have the same conversation_id, this will make it easy for you to review a full user conversation and spot any weird behaviors.


After you are done with your analysis, run your blind/test set against your updated workspace to ensure that you have trained your system well and that you haven’t negatively affected your assistant.

You might need to generate a random new blind/test set every 2–3 iterations of intents and training enhancements to get relevant results.

Make sure you don’t look into this set after extracting it and try to fix issues based on it’s results as it should always be your performance measure on how your system will perform with the general public. This is another Jupyter Notebook for blind testing that might help you.

Now you are ready to deploy the changes to your production environment with the updated content and training fixes.