Building a Covid-19 Virtual Assistant with Watson Services on IBM Cloud
As life has changed during the pandemic, many people are getting worried about COVID-19 and the new normal life. Call centers and hospitals have been receiving so many queries related to the lockdown, symptoms of the virus, transportation, work, etc.. which kept call centers busy with common questions. Therefore, we built a solution to this common problem and implemented a COVID-19 assistant to assist users with their daily queries using several technologies (Watson Assistant, Translator, Discovery, Voice Agent, Cloud Functions, and Node-Red)on IBM Cloud. This assistant supports the three most common languages in UAE (English, Arabic, and Urdu).
In this blog, masamh and I will be talking about how we built the assistant and its features so you can start building yours.
You can check the demo here: https://watson-covid-assistant-uae.eu-gb.mybluemix.net/.
The text-based assistant uses Watson Discovery, Cloud Function, and Translator (For Urdu, used in Node-Red. We will discuss this in the last section of the blog). We had to build an assistant for each language (English and Arabic) since the Arabic language has many dialects, so using a translator for Arabic can’t be enough to make it sufficiently performant, but both assistants follow the same dialog structure.
Text-Based ChatBot Flow
Users interact with the assistant through the web browser (1). The application calls Watson Assistant hosted on IBM Cloud (2). Then, if the users are asking about live stats numbers, Watson Assistant makes calls to Cloud Function to extract the latest information about COVID-19 cases(3), and the Cloud Function makes calls to COVID-19 cases API from Johns Hopkins CSSE (4). If users’ queries were not in the Watson Assistant’s dialog, the assistant calls Watson Discovery, which is connected to the Assistant through Search Skill(5). In this case, Discovery scans the Ministry of Health FAQ documents and responds with relevant information (6).
If the user communicates with the Assistant using the Urdu language, then the text is translated with Language Translator to English so the assistant can understand the question (7). We send the input to Watson Assistant, and the response is sent again to Language Translator and translated back to Urdu before sending it to the user (8). We will see more about Part 7 and 8 in the Node-Red section, which we will talk about at the end of the blog.
How the Assistant Works
The three main components of Watson Assistant are Intents, Entities, and Dialog. Intents are sets of user examples to train Watson Assistant. They represent the questions that a user might ask. Entities are dictionaries of keywords and patterns, and the Dialog is the flow of the conversation, where we can build nodes and link them to our intents and entities to design and handle the conversation. Each assistant has its intents and entities, and we trained the Arabic assistant using common words among different dialects to make it understand questions from different regions.
The below images show the dialog flow of each assistant. They have the same structure, but different languages, and every node represents the intents that a user might ask about.
One of the key features in this assistant is that it saves the emirate (city) name in a context variable and keeps it throughout the entire conversation. Saving the emirate makes the assistant stop asking which emirate the user resides repetitively. In the below example, the user chose Dubai. If he asks another question that needs an Emirate choice before answering, the assistant assumes that the user is still talking about Dubai emirate based on the previous question, so it will automatically give him the answer for Dubai.
If he wants to ask about Abu Dhabi, he can simply specify it in his query and the context variable would be overwritten with the new value just like we can see in the above image. We store the value in $emirates, and we use this variable in the nodes so we can let the assistant know which emirate the user is asking about.
Self-Screening with the Assistant
Users can do self-screening if they are concerned about catching the virus. We have designed a conversation flow where the user answers the questions from the bot about symptoms and their recent activity to help them figure out if they were at risk for COVID-19. We designed the conversation in a way that lets the user exit the self-screening flow at any time to ask about something else, and then once the user asks again about the self-screening, the assistant will go back to it and proceed with the flow.
This is known as digression, where you can simply just enable it in the node that you wish to let the user exit from at any time. Click on customize inside the node, and you will get this window to configure regression.
Disambiguation instructs your assistant to ask the user for help when more than one dialog node can respond to a user’s input. Instead of guessing which node to process, the assistant shares a list of the top node options with the user and asks them to pick the right one.
You can enable it by selecting the Disambiguation tab under Options and configure it. Here we chose five as the maximum limit to show suggestions when the assistant is not sure about what the user is asking about. In this example, the input was just grocery, so the assistant was not sure if the user is asking about grocery guidelines, or if they can leave their house to go to the grocery shop, so it showed the most relevant choices to let the user decide.
Adding an Interactive Map to the Assistant
We can add HTML tags to customize the text or to add specific elements, and this is what we have done in several nodes in the assistant. We added an interactive Google map that shows the location of screening centers for a chosen emirate.
Making the Assistant More Human
We can add pauses between responses in a node to allow time for a request to complete, which helps to mimic the appearance of a human agent who might pause between responses. The pause duration can be up to 10 seconds.
To get the latest updates related to the number of cases of COVID-19 in countries, we use IBM Cloud Functions. We can create a Function that uses Johns Hopkins CSSE API and connect it to Watson Assistant through a webhook. Cloud functions provide a URL that you put in the webhook tab under Options in the Watson Assistant service so you can connect them.
This above code is a function that calls John Hopkins API. It can take a country as a parameter and returns a JSON response that has information about the specified country. Once we have it, we can manipulate the response to extract the information that we care about, which are the number of total cases, death, and recovered.
You can enable the webhook by clicking on the customize button of the node. We have the country as a parameter, so we add it as the key, and its value can be detected using the “@sys-location,” which is an entity that comes out of the box with the assistant, and it identifies any location name.
With Watson Discovery, we can feed the service documents that contain more information and connect it to Watson Assistant. This feature can help show the results of any queries that are not part of the dialog. Instead, Discovery searches for the answer in the document fed to Discovery and sends the result using Search Skill. As we can see in the below example, the assistant returned multiple responses (we can modify the number of results to return), and the answer to our question is in Q13.
You just create a Watson Discovery service and feed it an FAQ file that has the format Question/Answer. Then you can click on configure data to start annotating the document using SDU (Smart Document Understanding). We annotate the questions with the field question and the answers with the field answer or text. Make sure to split the document using the question field.
Once the document is ready, you can go back to the Watson assistant and configure your search skill. Choose the question field for the title, text field or question field (based on which one you chose to annotate) for the body, and the filename for the URL.
Web Chat Integration
To integrate this assistant in your website, you just add the following code inside your code. You can get your integration ID and service instance ID from the assistant that you created on IBM Cloud.
As you can see in this code, we are saving the instance because we want to use it later on when we switch the language from English to Arabic and vice-versa. As you remember, we have two different Watson Assistant, so we want to load the right assistant correspondingly when a user clicks on the change language button.
The code above is the function that changes the assistant. First, we make sure to close and destroy the first instance of the assistant, because if not then we will have a stack of open assistant windows on top of each other when we want to change the language. On the language button click, we just change the ID and set it in the Assistant Chat Options to render it. Just like before, we make sure to store the instance so we can destroy it later on when we change the language again and instantiate a new one.
To provide a better experience for the users, you can integrate Watson Assistant with Voice Agent. This service offers a way to combine a set of different Watson services with a public or private telephone network by using the Session Initiation Protocol (SIP) . IBM Voice Agent enables direct voice interactions over a telephone with an AI self-service agent and the ability to transcribe a phone call between a caller and an agent so that the conversation can be processed with analytics for real-time agent feedback .
Voice-Based Assistant Flow
The user interacts with his mobile phone to call a number (1), then he phones a call center phone number that is associated with a Twilio account (2). The Twilio number connects to a configured Voice Agent service through the SIP communication protocol (3). Voice Agent calls the Watson Speech to Text service to transcribe the user input (4). Then, Watson Assistant responds based on the user input (5). The results from Watson Assistant are transmitted back to the Voice Gateway service using the Watson Text to Speech service and thus back to the user who will hear the response (6). The Cloud Function and Discovery parts are the same as the one discussed in the text-based architecture flow.
How Voice Agent Works
To create a Voice Agent, we need the following services:
- Watson Assistant
- Speech to Text
- Text to Speech
- Twilio (or other SIP Trunk providers supported with IBM voice agent)
We can’t use the previously created dialog skills in the text-based version for the voice agent since they contain images, maps, and a structure that won’t correctly fit with voice agent. To solve this issue, we had to create two other dialog skills customized for the voice agent, where we removed the google map feature and all the images and customized a bit their structures so it can provide a smooth experience. But all the dialog skills have the same content, and overall follow the same structure.
Twilio connects the voice agent with a telephone network using SIP, Watson Assistant handles the conversation, Watson STT (speech-to-text) transcribes what the caller is saying, and Watson TTS (text-to-speech) converts the assistant’s response to spoken voice output. Make sure to have all these services instantiated before creating a Voice Agent service.
Once you create your account on Twilio, go to the dashboard to get your trial number. You will use this number when you want to configure and test your voice agent.
Setting up Voice Agent in IBM Cloud is straight forward. Once you instantiate the service, go to the Manage tab and click create agent. Choose Voice for agent type and specify a unique name for your agent. For the phone number field, add the phone number (trial number) you have in Twilio.
Fill all the information in the other sections. Here we are using IBM services, but Voice Agent is not limited just to IBM. For example, you can configure STT and TTS using services provided by Google.
We still need to create a SIP Trunk in Twilio. Go to the Getting Started tab in the Voice Agent service in IBM Cloud, and copy the primary endpoint.
Go back to Twilio and open all the products and services tab under the dashboard. From here, choose Elastic SIP Trunking and select the Trunks Tab, then go to Origination. In the Origination URI, add the primary endpoint that you got from the Voice Agent service in IBM Cloud.
Go to the Numbers tab and add the trial number provided by Twilio (the one you have in the main dashboard). Everything is set and configured, and you can dial the number and test your voice agent.
Node-RED is a browser-based programming tool that allows you to connect code blocks that have specific tasks and functionalities. It uses nodes and flows to quickly write code that can connect to APIs, hardware, IoT devices, or online services . A node is a predefined code block and the flow is a connection of nodes, like input, processing, and output nodes . Here, we are running Node-Red on IBM Cloud, and we deployed our Node-Red app in Cloud Foundry. To know more on how to get started with Node-Red check this link https://developer.ibm.com/tutorials/how-to-create-a-node-red-starter-application/
In our Node-Red app, we are using Watson Language Translator with our Watson assistant discussed previously. It uses the English version of the assistant but will translate the responses to Urdu. You can download this flow from the Github Repo and import it in Node-Red.
In this flow we have:
- Light Blue Nodes: UI elements (send button, text input, and a clear chat button).
- Blue Nodes: Watson Assistant and Watson Translator language services. Click on them to customize your configurations and to set the API keys correspondingly.
- Yellow Nodes: used for switch conditions, or set and change some values for the flow.
- Green Nodes: Display output for debugging.
- Salmon Nodes: contains some code to add some functionalities (for example modify the JSON response that we get from Watson assistant regarding a query)
- Turquoise Nodes: HTML tags to add additional UI elements and to enhance existing UI elements.
The user first sends his query in Urdu. The assistant we are using is in English, so before going to Watson Assistant this query is passed to the translator node, which translates it from Urdu to English so the assistant can understand it. The translated query goes to the assistant node, which gives back the response in English. This response is passed again to the translator node, which translates the result back to Urdu and finally shows it to the user. This Node-Red is using architecture flow discussed in the first section, with language translator service.
We integrated this Node-Red app with our main website. Since this Node-Red app is deployed on IBM Cloud, we take its URL and add it to our code. Once the user clicks on “Voice & Translator” on the main page of the website, he will be directed to this Node-Red app so he can try the translator feature.
In this blog, we shared the journey of building an advanced COVID-19 virtual Assistant using multiple technologies to help citizens with their concerns about the current situation.
To summarize, we built the assistant in two languages English, Arabic, and supported the Urdu one with translation. We used a webhook to connect the assistant to Johns Hopkins API and Search Skill to connect to Discovery. To build the voice agent, we connected the assistant to Twilio, STT, and TTS services. Finally, we used Node-RED to build the chatbot using Watson Language Translator for Urdu and connected it to the English version chatbot Assistant.
Sign up for your IBM Cloud Account and get started with building your assistant and exploring the cloud services.