COVID-19 — Are Your Virtual Assistant’s Answers Up-To-Date?

Easily make sure they are, with IBM Watson’s new COVID-19 Kit and FAQ extraction capabilities

Christophe Guittet

Published in

IBM watsonx Assistant

8 min readApr 18, 2020

Authors: Christophe Guittet, Anastas Stoyanovsky and J William Murdock

Let’s face it: maintaining your chatbot’s answers to be certain they’re accurate can take a lot of time and energy.

Take a COVID-19 chatbot for example. Events and circumstances are changing on a daily, and even hourly, basis. How do you maintain a chatbot that provides relevant and valuable information? Having information that updates at the same time as official sources would be a huge step forward.

You can help your customers face this crisis by letting your chatbot answer their questions on COVID-19 and its effects on education, businesses, the economy, and society in general. To keep these answers up-to-date though, you would have to pay close attention to any change in the numerous sources of information about COVID-19. Then you would still have to manually update the answers in your chatbot and make the new version available.

Thus, the question is: how can I extract answers from trusted sources and automatically update my Virtual Assistant when the content changes?

The answer is: you can use IBM Watson Assistant and Watson Discovery to do it.

In a couple of minutes, you can create a conversational AI assistant and connect it to a data source using the IBM Watson Assistant Search Skill. Have a look at this short video to see how.

Even better: you can now leverage the IBM Watson Discovery COVID-19 Kit with automatic FAQ extraction to start faster - in just a few clicks!

This kit contains curated information retrieved every day from trusted sources. To do so, Watson Discovery uses innovative FAQ extraction machine learning models developed by IBM Research labs. This kit is also designed to be augmented with your own data.

Choose the COVID-19 Kit with automatic FAQ extraction in Watson Discovery’s connectors list. The COVID-19 Kit is currently available for Watson Discovery instances deployed to IBM Cloud US-South and US-East regions. It also only supports English language for the moment.

If you don’t already have an Assistant, you can sign up here.
Our team has been working as part of a larger IBM effort to help face COVID-19 - and delivered this new product capability in an impressively short amount of time.

When crawling a web page, Watson Discovery will detect whether it contains FAQs. If it does, it will extract each Question-Answer pair as its own document.

Note for geeks: Watson Discovery indexes the question in the title field and the answer in the text field. This eases Search Skill’s configuration and ensures the best results’ display within Watson Assistant’s webchat widget.

How to enlarge your Assistant’s scope by adding hyper-local content

The Watson Discovery COVID-19 Kit is pre-configured with web crawl seeds from trusted sources such as the CDC, Harvard, and the United States Department of Labor. An expanded stopwords list and some query expansions are also included in this collection to improve search results.

FAQs extracted from these sources cover common questions related to COVID-19 itself, as well as its effects and some governmental responses. You can augment this collection with data relevant to your region.

To do so, when on your COVID-19 collection’s overview page, click on “Sync settings” and add URLs to the web crawler. These sources will be crawled using the FAQ extractor mentioned earlier.

Consider adding the COVID-19 homepages from:

your state or regional health department, for any local guidance and/or testing sites;
your state’s education department, for information on schools and online learning;
your state governor’s site, for press releases and executive orders;
your state government site, for any resources it makes available;
your state’s department of commerce or revenue, for information such as on business closures or financial support;
any major local news outlets’ websites;
and anything local to the audience you want to serve, whether your own documents or from other public web sources.

Review each data source with the following guidelines in mind:

Carefully configure any hops on the web crawls (click on the crawl settings icon after adding a seed URL). If the website has headers and footers that link to general pages such as “About Us”, configure 0 hops.
It is best to configure 0 hops for pages that are updated daily, such as news sites.
If it won’t pull in unrelated content, one hop can be useful for pages that link to resources on a similar topic but for different audiences (such as “for teachers” and “for students”).
Synonyms (query expansions) can significantly help and should be chosen based on the data sources you’ve curated. Start with acronyms, such as “CMU” and “Carnegie Mellon”. Choose synonyms judiciously, verifying that queries return the expected results.
Remember that the Search Skill returns only the top 3 results for a particular query. Consider whether adding multiple sources with overlapping content on the same topic will be useful.

Some US states — such as Alabama — have created dedicated one-stop portals. In this case, you may consider only using that data source and configuring it with a larger number of hops.

To provide a world class end user experience, also note that:

You can change how often the content is updated by setting a different sync frequency. By default, the COVID-19 Kit sources are crawled daily.
If you delete some URLs (either the pre-populated ones or the ones you added), be aware that the content already indexed will still live in your collection. You will need to delete this content manually.

How to further improve search accuracy by customizing the Search Skill

By default, the Search Skill sends the user’s input as a natural language query to search within Watson Discovery’s whole collection of documents. The Search Skill will generally return 3 matching documents by decreasing order of confidence. This allows the end user to decide on the most relevant answer.

You can, also, configure your assistant to display a specific answer.

If for a given question, you prefer to return one specific answer instead of displaying the top 3 documents, you can:

create a dialog node in your Watson Assistant’s Dialog Skill,
as a condition for your dialog node (“If assistant recognizes”), use the specific intent you want to catch the user’s question with. You can use Watson Assistant’s Content Catalogs to add COVID-19 pre-defined intents. You can also write your own, for topics of particular interest to your users.
in the drop-down below “Assistant responds” select the “Search Skill” response type
click on “Customize” and explicitly filter the search to point to a single document. See how to use the filter here (scroll down to the Search Skill paragraph). Fill the Query input box with the FAQ question you want to point to, and the Filter input box with “title:” + the FAQ question (see example below).

This approach is precise and effective, but has a risk of returning no answer if the source document’s title (in our case, the question) changes. You can always consider these alternatives:

You can put a query in the Query field but not put a filter. You can write that query to match the document you know is the best fit for this intent. This is particularly useful if the document you want is in the collection but is not showing up on the result list. With enough distinctive terms from that document into the query for that intent, that document will be displayed as the top result. This approach is less brittle than the filtering approach above. Yet, even when you can get the document you want to the top of the list, you generally do still get three search results, which is not as nice as just getting the one correct result.
You can put a query in the Query field (or leave it empty to use the user’s input) and put a filter that is less restrictive than matching the full title of the document. The Discovery Query Language has a powerful structure that you can use for filtering. For example, the filter title:risk,title:!incubation will accept any documents that include the word “risk” and do not include the word “incubation”. This approach lets you balance some of the precision of filtering on the exact title while still keeping some of the robustness of the unfiltered approach. Note that you can use the filter with fields other than title. For example, filter on the source URL using metadata.source.url . You can even filter on a field enriched using Watson Discovery out-of-the-box or custom NLP enrichments.

Additional quick tips to reach the optimal experience:

If you find the answer’s snippet displayed for each result too long, you have the ability to return only the titles (in our case, the FAQ question) of the top 3 documents. Just leave the Body parameter blank when configuring the Search Skill in Watson Assistant.
Alternatively, if you prefer to allow the user to expand each answer snippet via a “Show more” toggle (instead of displaying a link icon redirecting him to the source document) keep the Body parameter but leave the URL parameter blank.

Display options (from left to right): with Title, Body and URL / with Title and URL / with Title and Body.

Conclusion

As you can see, it is quick and easy to set up an always-up-to-date virtual assistant using the Watson Assistant Search Skill and the Watson Discovery out-of-the-box COVID-19 Kit.

IBM Watson products’ full range of AI capabilities also gives you the possibility to explore more advanced implementations. Don’t hesitate to give it a try!

If you have any suggestions for improvement, please share them with us on our product ideas portal, mentioning “Search Skill” in the title.

You can also contact me directly at christophe.guittet[at]ibm[dot]com

Many thanks to Anastas Stoyanovsky and J William Murdock for their contribution to this blog post!

About me: I’m an AI Product Manager at IBM, passionate about language and technology, working towards making Conversational Search a reality. I moved to Pittsburgh, PA to join the IBM Watson teams after having delivered many virtual assistants projects in Europe. In my free time, I read science-fiction novels and play the drums.

Anastas is a senior software engineer at IBM and a software architect for IBM Watson Discovery, with a focus on information retrieval and artificial intelligence. He uses his backgrounds in mathematics, artificial intelligence, and software architecture to develop and bring technological advancements to market, often in collaboration with IBM research.

Bill is an IBM Principal Research Staff Member. He has been working on IBM Watson since its inception in 2007 in the IBM Jeopardy! Challenge, and he was the guest editor of This is Watson, the 2012 special journal issue explaining IBM Watson for Jeopardy!. Dr. Murdock currently works on making information finding more effective in IBM Watson Discovery.