Things to consider when you’re building your first voice/chatbot

GU Human Language Technology
6 min readNov 9, 2021

Written by Jungyoon Koh, Victoria Jin, Ulie Xu, & Wai Ching Leung

So, you’ve decided to build your first voice/chat bot.

First off, congratulations! And welcome to a wonderful journey that we here at HLT have recently embarked on. In all honesty, we’re probably only a few steps ahead of you.

Given how we recently began our own journey, we thought we’d offer some tips for those who are also just starting out. The points we raise might be super basic to people with more background or experience, but as full-time students with little to no prior knowledge in user research and conversation design, we found these things to be really important in keeping our project manageable. We wanted to help ease others like us into the process (and show that it’s a feasible side project for students to take on)!

When we set out to build our first voice experience this past summer, we spent a solid chunk of time just brainstorming, planning, and thinking the process out carefully before anything else. However, we still learned a considerable amount “on the job”! As we faced various challenges in both designing and building the Google Action, we realized that our experience could be documented and shared for other wayfarers into the world of CxD.

And now here we are. Below is a short list of important considerations for anyone looking to build their first CxD project!

  1. What’s your medium of choice — voice or text?

Speak or tap? Ears or eyes? Right off the bat, you should decide: voicebot or chatbot?

It’s important to choose where your bot will come to life early on. This has a lot of implications for what you’re going to be able to build and how useful it will be, as each medium offers its own unique affordances. For example, if your bot is text-based, you can design dialogue with a higher word count per turn. This is especially useful for information-providing bots that may provide large amounts of output — the user can quickly skim the bot’s text and glean the information they need. This also means that you can design a more complicated conversational flow since you don’t have to rely entirely on intents. In text, you can use buttons to give the user options for what the bot can do for them, but in voice, there are no “buttons.” Even if they’re given options verbally, the users might still use different phrasing in their responses. In order to minimize this unpredictability in voice, the conversational flow is usually more constrained.

Because it’s relatively easy to confuse both the bot and the user in voice, if you’re using voice the turns of talk should be shorter and require simpler types of user input. Since the user could respond to the bot in any number of ways (which is why, friends, we spend hours thinking about the different ways of saying “set home route” when creating utterances), it’s best to design the conversational flow so that the user can give easily decipherable answers.

At the same time, you can be creative in thinking about what kinds of voice applications don’t yet exist (but would be convenient to have). Since voice is still a relatively new field, there is so much room for innovation. Remember, what once could only understand single words can now tell you a joke, play your favorite song, or turn off your lights!

It’s important to note that whichever medium you end up using, you have to be mindful of conversational style, although in different ways. For example, pace can be expressed through pause lengths in voice or the duration of “…” indicators of typing-in-progress in text. Tone can be expressed vocally or through the use of emojis and ~punctuation~ in text. 😁🥰😃

2. Does a similar product already exist? If so, how would you differentiate your product from the other one? If not, does your product have a solid use case?

To answer these questions, you have to conduct basic market research. Even if you’re only looking to dabble/dip your toes into conversation design, market & user research are integral parts of any CxD project. Give the people what they want!

The research you do early on in your design process is going to help flesh out your minimum viable product and make sure that your design is useful and usable. If you don’t think too much about this part of the process, you’re going to end up with a design that is unclear on what problem it sets out to solve and how it’s different from existing designs. It’s obvious, but it’s like emphasizing that when you start a research project it’s important to make sure you have good research questions.

If a similar product doesn’t exist, this could be either really great news or not great news. Ideally, you’ve stumbled upon a pressing, unaddressed need of society, so your product will be the greatest thing since sliced bread! On the other hand, the lack of similar designs could indicate that there is not enough need for your product, at least not at the moment. It’s important to decide if your product has a solid use case, a raison d’etre. Conducting market research is vital to ensuring that your product is worth creating. Think Ford, not Frankenstein.

3. What kind of task will your bot perform, and what kind of information or context does that task require?

This is especially important if you’re trying to actually build the bot. Even if the task is a simple one, if the bot needs to retrieve, remember, or otherwise manipulate information to accomplish it, the bot can be significantly more difficult to build. This was actually a problem that we ran into as we were building our Action — our bot had to remember information for the user across multiple conversation sessions (so that the user doesn’t have to remind the bot of their home station each time they use it) and we ended up changing our development platform multiple times in search of one that made it easier for us to link information to user accounts. Another important consideration in the same vein is API. If you’re building an information retrieval bot (like we are!), one of the first things you should make sure of is that there is a database or API that you can use for it.

If you’re only trying to build a prototype, it’s still important to identify the specific conversational context in which the bot would be summoned to complete a task. For example, a common fantasy that people seem to have is using Alexa or Google Home to chime into conversations by fact-checking statements made by a previous speaker. This would be difficult to design even as a prototype because it requires that you anticipate the various ways in which the need for such a task could arise in a conversation and then accommodate for those different kinds of conversational patterns. (Unless, that is, you design it so that the user calls out the wakeword and asks the device to look up a specific question on the internet, in which case you’re not really making a new product…)

These are just some of the important considerations for any person starting off in CxD. And don’t worry, you definitely don’t have to make a full-blown product your goal, just a prototype can be good for now! We wish you the best for the rest of your journey. Bon voyage :)

Coming soon… what to consider when you’re trying to choose between Google and Alexa as a platform for your voice experience!

--

--