Verba volant, scripta manent. Experimenting with Dialogflow Messenger and BigQuery

Published in

Google Cloud - Community

10 min readMar 30, 2022

I am under the impression that voice agents are having a much bigger moment than chatbots similarly to how a few years ago bots cast a shadow over mobile apps and before then mobile apps shadowed web apps. Speaking of trends if texts have eclipsed phone calls, voice messages have surpassed texts. In a terrific article published on the Guardian in December 2018 Chris Stokel-Walker identified voice messaging as the next big form of communication. During the COVID-19 pandemic over 15 billion minutes of voice and video calls were made on WhatsApp every day.

Always consider the listener. At least in a real-time phone call, it’s possible for the other side to cut in on the chat about your latest tiff with Tiffany from HR. Chivvying along conversations isn’t possible if you’re giving a recorded monologue, so keep it brief and to the point. Or better yet, ask yourself whether it needs to be said at all. No one likes to sit through a five minute sub-Shakespearean soliloquy. (Chris Stokel-Walker)

In this post I would like to analyze pros and cons of voice vs text in the context of Conversational Interfaces, why often text is a better channel of communication with a virtual agent. We will then explore a simple chatbot that writes data into a BigQuery table and we will look at how we can embed it into a web page running in App Engine through a customizable chat dialogue using Dialogflow Messenger.

Virtual or human, always consider the listener. They’re called instant messages for a reason :-)

Let’s start from the beginning: voice vs texts

Why did voice messages become so popular? I guess because they are somehow in between phone calls and texts. They allow for quick communication like phone calls but they’re less “intrusive” than phone calls. Calling is seen by some as an intrusion - forcing people to talk to you when they might not be ready or prepared. Anecdotal evidence suggests that we have grown scared of picking up the phone. We’re in touch with each other constantly, but written communication allows us to participate in the conversation at the pace we choose. Voice messages allow to communicate asynchronously (as opposed to phone calls) but they are less impersonal and “aseptic” than text messages. Voice communication gives a really rich sense of emotion and a higher sense of connection, you can get the sense of things across better with the tone of your voice, rather than relying on emojis.

How many times did you get in trouble with text?

In a few words voice messaging provides the best of both worlds. It is asynchronous like texting yet more emotive and more spontaneous than text-based communication. If you send a voice message, you can still continue to use your phone, or do other things (have you ever recorded a voice message while driving? ;-)). Having said that receiving a voice message might be less convenient than sending one: very often (especially if we are in a room with other people) we cannot listen to the audio unless we use headphones. If a text pops up on your screen you can quickly tell whether or not it’s urgent. If the only clue you have is a picture of an indistinct sound-wave, you don’t know if that message needs your attention right away or can wait until you have a spare minute. Thinking about the disadvantages of voice vs text we shall not forget that people sitting or standing or walking near us are not interested in our chats, we shall be mindful of the circumstances and use text-based communication when it’s more appropriate. Lastly, as mentioned in the caption of the image always consider the listener and be respectful of their time. Voice messages tend to be more verbose than texts, simply because it takes less time to speak than typing on a small keyboard.

Chatbots vs voicebots: the insoluble dilemma

Is it really an insoluble dilemma? I don’t think it is. Why? Because the answer is often in the use case we need to implement. Don’t seek a use-case for the technology you want to use - choose the right technology to address a real use-case. I have seen overwhelming and chaotic mobile apps that should have been easy-to-navigate web apps and robotic conversational interfaces in place of nice and visual mobile applications. I am afraid we are seeing a similar trend for voicebots and chatbots. The former are becoming more and more popular, companies are investing a lot of money to migrate traditional Interactive Voice Response systems into Conversational Contact Centers.

Let’s dive into the strengths both of these conversational interfaces bring to the table.

When should you consider a voicebot?

With voicebots, you can communicate with the AI just by speaking to it. Voicebots are ideal for people of all demographics and age groups. The ease of use makes it easy even for those who aren’t particularly tech-savvy to use the bot and still have a great experience. Voicebots work like speaking to someone over a telephone call, just like dialling a number on the phone. Voicebots don’t need messaging platforms as opposed to chatbots, neither they need to be integrated with the company website or mobile-app. From a customer experience perspective it is as easy as picking up the phone, dialing a number and speaking with a human agent. Voice can be a strategy to delight customers. When people hear a voice, they instantly make assumptions about the speaker’s gender, age, social status, emotional state, and place of origin, as well as personality traits like warmth, confidence, intelligence, etc. People can’t help but do this with virtual assistants, too. When creating a “persona” for their virtual agents companies can hire professional voice actors and choose the best voice by holding an audition. Why? Because a recorded voice is a lot more natural, expressive and human than a synthesized one. It can convey humor, sarcasm, trust and ultimately turn a virtual assistant into a conversational partner. At last but certainly not least voice is an assistive technology for visually impaired or blind people. Broadly speaking voice helps people in situations when their hands or eyes are occupied, or when they’re on the move. When we carry a shopping bag or when we drive or when we are busy cooking etc, in these or similar circumstances we all experience a sort of impairment and voice becomes the best communication channel.

When should you consider a chatbot?

Voice is a powerful approach, but it’s not always the right choice for every use-case. For example, it works well for the task of finding a restaurant’s business hours, but it feels clunky for browsing a dinner menu. Voice is the right approach when the interaction with the user is brief, with minimal back-and-forth dialog. On the other hand chatbots are excellent for working through complex chat flows. It is ideal for non-linear user journeys where the chatbot displays many options that need not be remembered. Chatbots simplify complicated use cases with user-friendly interfaces. Back to the previous example, the results of a flight search could be displayed in a chat dialogue as a list of items and a retail chatbot could show the user the last black dresses remained in stock though a carousel of images. FAQ bots often provide users with phone numbers to call, email addresses, URLs and the end-user might have to navigate a web site to complete a task. In these circumstances they might need to look back at the messages sent by the chatbot and a voicebot due to its transient nature (as the title of this blog post suggests) simply cannot do it. Avoid very long responses, if you can’t help that (for example, FAQ bots provide a lot of information) then opt for a chatbot. Furthermore chatbots support custom rich info response types like buttons, accordion panels, suggestion chips, audio-visual media like pictures, videos, GIFs and more. The combination of these features can make the experience richer and more engaging, on the contrary voicebots are confined to audio-based media. To determine whether a voicebot is the right strategy for your use case consider whether your users would feel comfortable talking or typing about certain topics. Spoken conversations are best in private spaces or familiar shared spaces. Written conversations are best for personal devices.

To summarize there is no one size fits all answer to this question. If you’re wondering which one is better between chatbots vs voicebots — it depends on what works the best for your customers and the kind of issues you are planning to solve at hand. An excellent way to find the sweet spot for your business is by finding the overlap between your business’s goals and your users’ goals. Other considerations include the level of engagement, volume of information to be transmitted, the types of media that can be used and whether the user journey is linear or nonlinear. My advice is to review the following statements to determine whether conversation design is the right strategy for your features in the first place. If you’re checking off most of them, it’s likely that dialog is a good fit. If conversation is the right fit, then think carefully if voice can add value to your use case or not.

Fantasia: an experiment with Dialogflow Messenger and BigQuery

Now open up this page, click the blue chat icon at the bottom right and give it a try!

Before diving into the technicalities let me make a step back and give you some context. Some time ago Ivan and I needed to collect some data to train a ML model to predict helpdesk issues resolution times instead of using an existing dataset available on GitHub. By the way do yourself a favour and read this great article written by Ivan to learn about the whole idea.

I thought a good way to produce random data was through a Chatbot that presented users with a set of questions and wrote the answers into a BigQuery table that had the same schema of the Helpdesk Issues dataset. I also had to choose a channel to let people interact with it. The channel meaning the platform that will allow your users to interact with the bot depends a lot on who they are and the needs they have. Try to answer questions like the following:

Who are my users?
What are their needs?
How are they completing these tasks today?
What words and phrases do they use to talk about these tasks?
What situations or circumstances trigger these tasks?

While it’s important to optimize for your most frequent users, don’t do so at the expense of other users’ experiences. A well-designed product is inclusive and universally accessible. Designing for different populations means leveraging inclusive design or universal design strategies. The channel you will end up choosing should be as inclusive as possible.

Back to my little chatbot, I decided to use Dialogflow Messenger and embed it in a simple page hosted in App Engine. Dialogflow CX supports a number of built-in text based integrations and are configured with the Dialogflow Console like Facebook Messenger. The Dialogflow Messenger integration provides a customizable chat dialog for your agent that can be embedded in your website. If you’ve played with “Fantagent” you have noticed the chat dialog is implemented as a dialog window which appears in the lower right side of the screen.

You can customize various aspects for how the chat dialog appears and behaves. The df-messenger HTML element has a number of attributes like the chat-title and a first initial intent that shall be triggered when the chat dialog is opened (like the greeting message displayed above).

The `<script>` and `<df-messenger>` HTML elements should be in the `<body>` element of your page

If you’ve actually made it to the end of the dialogue you would have come across a widget to show all the possible categories and let the user select one.

The list response type is a card with multiple options users can select from.

This widget is a List response type, a card with multiple options users can select from. When creating fulfillment, you can create Text Responses and Custom Payloads . The text responses are used for basic agent responses, and the custom payloads are used for rich responses. Custom payloads can be returned from a webhook (as part of the webhook response) for dynamic responses or they can be configured in the fulfillment section of a page when designing the agent through the console. See the code below for the custom payload. When the user clicks an option from the list a custom event is triggered which allows me to identify the user choice.

{
  "richContent": [
    [
      {
        "type": "list",
        "event": {
          "languageCode": "",
          "parameters": {},
          "name": "Authentication"
        },
        "subtitle": "If you can't login or you're having issues signing in",
        "title": "Authentication"
      },
      {
        "type": "divider"
      },
      {
        "type": "list",
        "title": "Billing",
        "event": {
          "name": "Billing",
          "parameters": {},
          "languageCode": ""
        },
        "subtitle": "If you have a billing enquiry"
      },
      {
        "type": "list",
        "title": "Performance",
        "event": {
          "parameters": {},
          "languageCode": "",
          "name": "Performance"
        },
        "subtitle": "If you've seen a degradation of performance"
      },
      {
        "type": "list",
        "subtitle": "For any technical issues",
        "title": "Technical",
        "event": {
          "languageCode": "",
          "name": "Technical",
          "parameters": {}
        }
      }
    ]
  ]
}

Once all the values have been collected a webhook call is done to a Cloud Function to write a record into a BigQuery table through the Google BigQuery Node.js Client API. I highly encourage you to check out this great Google BigQuery Client Library for Node.js which includes a number of samples including the insertRowsAsStream() function which I used to write the data passed from the bot to the Cloud Function into a dedicated BigQuery table.

Conclusions

Conversation design is a powerful approach, but it’s not right for every use case. After you have determined a Conversational Interface is the right fit think carefully about the channels of engagement. A good conversation design process starts by gathering the requirements. Gathering requirements for a conversational experience is not just about defining features and functionality, though that is the main outcome. Starting with clear, well-researched requirements is the best way to avoid the need for major changes after design and/or development is completed.

Thanks for reading through the end and until my next post!