Best Practices in Conversational AI

Arte Merritt
Published in
17 min readAug 3, 2021


Conversational AI is the natural evolution of Human Computer Interaction

I am a strong believer in Conversational AI. If you recall all the videos of two-year-olds swiping on the iPhone or iPad, the same thing is happening with devices like Alexa — kids already know how to interact.

Building conversational interfaces can be challenging given the free-form nature of communication, and the unstructured data. Users can say, or send, whatever they like. It is difficult to know all the things a user may say, or how they may say them. It is quite a bit different from web or mobile where there are only so many links or buttons a user can click.

This guide takes into account the insights from processing 90 billion messages across thousands of chatbots and voice assistants, as well as the feedback and advice from leading experts interviewed in the industry.

Table of contents:

Why chatbots or voice assistants?

Conversational interfaces are used across a wide variety of industries and use cases. While customer service is a common theme, there are plenty of additional use cases across enterprises in financial services, healthcare, insurance, travel, retail, government, education, HR, and more.

Cost savings is often a driver for chatbots and voice assistants as enterprises hope to reduce the load on live-agents. A chatbot or Interactive Voice Response (IVR) can help reduce the number of calls to live agents through deflection, or speed up the resolution times through more efficient routing to the proper agent, or supplying the agent with necessary information to complete the call faster.

However, enterprises are not only implementing chatbots to reduce operational costs, but to improve the overall customer experience. Chatbots and voice assistants provide consistent, 24/7 availability, on the channels customers prefer to communicate — whether that is chat, voice, SMS, or social messaging platforms. They enable enterprises to provide quality responses and maintain compliance. For example, I interviewed a leader from a major tech firm whose client had difficulty preventing their agents from saying the “wrong things.” With the chatbot, it could only say what it was programmed to do and helped maintain compliance.

Conversational AI is an iterative process

Developing conversational AI interfaces is an iterative process. One needs to build and train the Natural Language Processing (NLP) model to understand the user’s Intent and translate it to an appropriate response; create an intuitive, natural, conversation-flow model to navigate a user through their journey; integrate into existing back-end services to process the user’s request for content, information, or transactions; and test, monitor, and measure the effectiveness of the solution. This is a continuous iterative process to incorporate the learnings as a feedback loop to improve the solution.

The best practices outlined are based on insights from having seen nearly 90 billion messages across thousands of chatbots and voice assistants, as well as interviews with leading experts in the space.

Choosing the initial use cases

Chatbots can be used for a wide variety of use cases. The most common interactions are informational (i.e. checking delivery status), data capture (i.e. booking an appointment), transactional (i.e. transferring funds), or proactive help (i.e. monitoring usage behavior and proactively offering help). The last one is particularly interesting as AI continues to advance.

When selecting initial use cases, it is best to start simple and iterate. High volume, low complexity tasks are often the best place to start. For example, checking order status or looking up store hours or location. It is important to monitor the actual usage behavior to understand how users are interacting, and what they are asking for, which can be used to iterate, and expand to additional use cases.

A great way to identify initial use cases is by analyzing historical data for common use cases. If you have an existing live-agent channel, the data can be processed to identify common requests. One way to do this is via Semantic Similarity clustering of the user requests, using tools like TensorFlow’s Universal Sentence Encoder or BERT, or a web service like PiRobot. Semantic Similarity groups messages together based on the semantic meaning which can help identify common user requests, and thus initial use cases.

When getting started, it is important to consider which use cases require more complex integrations or data compliance. If you are just getting started, it is better to start more simply, and iterate. More complex interactions requiring data access add complexity around authentication, data handling, and data compliance.

Lastly, it is important to remember that what works for web pages, may not work well for conversational interfaces. For example, trying to regurgitate long answered FAQs into a voice assistant will not be a good user experience.

Building and Training NLP Models

One of the most challenging aspects of conversational AI is building the Natural Language Processing (NLP) / Natural Language Understanding (NLU) model. Given the unstructured nature of conversation, it is difficult to know all the things a user may ask, or how they may ask them.

The NLU model translates the user’s utterance into an underlying Intent as well as extracts any associated entities. For example, if a user asks, “what is the weather in San Francisco,” the Intent would be to check weather, and the entity, “location,” would be San Francisco.

Similar to selecting initial use cases, when it comes to building an NLU model, it is better to start simple and limit the complexity of the Intents and entities, and iterate based on actual usage. For example, there was an agency building voice skills who found their Intents with multiple entities were hardly triggered. They reduced the complexity by reducing the number of entities, and saw the usage of those paths increase.

Analyzing historical data for Intents

One of the best places to start is examining historical live-agent log data. The inbound user messages can be clustered based on Semantic Similarity to identify both Intents and potential training phrases.

Semantic Similarity groups similar messages together based on the meaning and not just the words themselves. For example, in the graphic below, both requests for refunds and wanting money back are grouped together based on the meaning, even though the words are different.

This cluster of similar messages could be the basis of a refund Intent, and the individual messages could be used as training phrases for the Intent. Tools like TensorFlow’s Universal Sentence Encoder or BERT can be used to do this programmatically, or web services like PiRobot could be used as well.

Leveraging crowdsourced or pre-made models

If you do not have live-agent log data to work from, crowdsourced or pre-made models are alternatives to kick-start the NLU model development. There are a variety of options for crowd-sourcing data including AWS Mechanical Turk or web services specializing in data. Many of the underlying platforms include pre-made models or blueprints, like Alexa Blueprints or Dialogflow Prebuilt Agents.

When crowdsourcing data for NLU models, it is important to be careful of potential biases that may occur. International language biases can occur when the contributors use different terminology, slang, or spelling due to their locale, even when using the same language. For example, “potato chips” in America versus “crisps” in England. Internal biases can also occur when the enterprise has their own internal terminology or uses industry terms that end-users may not even know. For example, a retailer referred to their product catalog internally as an “order book,” so if they asked crowd-sourcers to list ways they would ask to look up something in an “order book,” they may get very different responses than using “product catalog.”

While pre-built models can help start the process, it is important to customize them for your specific use case and business.

Creating a great user experience

Conversational interfaces are quite a bit different from web and mobile. Given the free-form nature of communication, it can be hard to know everything a user may ask or be able to respond effectively. A user’s flow through conversation can take a wide variety of paths. It is important to develop an intuitive experience that can help keep users on the “happy path” to satisfaction, while providing guide rails and effective help when they veer off.

Educate during onboarding

Onboarding is a great opportunity to educate users on what the chatbot or voice assistant can do, and how they can interact. For example, greet and welcome the user, set expectations, and provide examples of what the user can do. This can be an opportunity to show a menu or even make use of “quick reply” buttons for common use cases.

In the example above, the chatbot welcomes the user, lets the user know what it can do, and even provides suggestions on what to ask.

Leverage context and personalization

Context is key. Where is the user? What are they doing? What modality are they using? These are all important aspects to keep in mind. Is the user on the go, driving the car, or standing in the kitchen with their hands busy? Do they have a screen to view or touch, or audio only?

The modality of the interface can have a significant impact on conversation design. With voice interfaces, it is generally better to get to the point quickly, with shorter prompts and answers. With text interfaces, there is an opportunity to enable menus and buttons to more easily guide users.

The user interactions tend to be different across modalities as well. In text based chatbots, users may write in shorthand or use emoticons, which you would not see in voice utterances.

Personalization is also important. The information known about the user, or their interactions, can help resolve an issue faster and provide a better user experience.

With both context and personalization, flows can be streamlined, especially for returning visitors. For example, if one calls an airline after booking a flight, the IVR knows this information and can prompt the user if they are calling about that flight — thus helping move the conversation along faster.

Fail gracefully

Failing gracefully is one of the most important takeaways in conversational AI development.

Building conversational interfaces can be challenging. It is hard to cover all the things a user may say. Errors will happen. The important thing is to help get users back on the “happy path” as smoothly as possible.

A “fallback” occurs when the chatbot does not understand the user’s Intent. Instead of replying with “I don’t know what you’re asking,” let the user know what the chatbot can do. Consider providing menus, sample questions, or quick replies — similar to when on-boarding.

Another option to consider, is to incorporate a knowledge base in the fallback handling as an alternative. For example, instead of sending an “I don’t know” message, one client ran the user’s query through their knowledge management system and responded with relevant articles that could potentially help the user. They added link tracking to see if the articles were clicked, and prompted the user if the articles were helpful or not.

Provide a means to escalate

While it is important to fail gracefully, it is also important to provide a path to escalation. The ability to escalate to a live-agent is one of the most common requests of end-users.

Chatbots and IVRs can enable more efficient routing to the proper agents, by passing on the context and any information already collected — thus enabling agents to better serve the user and handle the issues more quickly.

If you do not have live agents, consider escalating to email support or a trouble-ticketing system.

Consider personality and branding

Chatbots and voice assistants provide an opportunity to reinforce your brand. The ones that incorporate some form of personality tend to see an increase in engagement.

There is a fine line though, and it is important not to go overboard. This is why some enterprises employ former screenwriters or comedians as conversation designers, as they better understand flows and timing.

If you do incorporate a personality, it is best to create a guide to keep track and maintain consistency.

Handle common messages

There are common messages that chatbots and voice assistants should be able to handle.

“Hi” and “hello” are the most common messages sent to chatbots. It is not completely surprising, as it is how we often start conversations between humans as well. However, over 30% of chatbots do not respond with any kind of welcome message back. For example, when a developer of a Facebook Messenger chatbot asked for feedback, the first thing I typed was “hi,” and the chatbot responded “I don’t know what you’re asking me.”

“Help” is another common message that is often not handled. Nearly 50% of chatbots do not respond with anything helpful. Conversational interfaces are still relatively new, and providing a meaningful response to “help,” can be quite helpful. Consider providing a menu, a path to escalation, or similar help to onboarding.

“Stop” is another message that is often not handled, with nearly 60% of chatbots not actually stopping sending messages, or providing instructions for stopping. This is important in asynchronous environments, like SMS or Facebook Messenger, where chatbots can send messages proactively. For example, there was a sports chatbot that sent users score updates and they found when a user’s team was losing, the user would get upset, reply “stop,” and eventually block the chatbot. The company then had to pay to reacquire the user through advertising. They saw this in the data, and implemented a “pause” functionality which meant the difference between a lost user and a retained one.

Collect feedback

Chatbots provide a great opportunity to gather feedback, given the conversational nature. With chatbots, users not only say what they want, but what they think about the response as well. This is an opportunity to gather additional insights that can be used to improve the experience. Consider proactively asking the user if they are satisfied with the response or interaction — i.e. “was that helpful, yes or no?” If you do ask, it is important to check the results and incorporate the learnings back into the chatbot.

Generate awareness

There are a variety of ways to generate awareness for a chatbot or voice assistant.

For web and mobile chatbots, one of the most common methods is to link to the chatbot across the entire site or app, typically with a chat icon in the lower right hand corner.

Another option is to list the chatbot on the support or contact page. When listing here, it can help to highlight the benefits or reasons for using the chatbot over other options. For example, stating that the chatbot is available 24/7 or has no wait times, versus trying to connect with a live-agent.

For telephony contact centers, while a user is waiting, the IVR can ask the user if they would prefer to interact with the chatbot via mobile or SMS instead. Similar to above, it is helpful to highlight the benefits, such as no wait time.

For voice skills and entertainment chatbots, leveraging existing marketing channels and social media can be quite helpful. In the early days of Facebook Messenger chatbots as well as Alexa Skills, producing video campaigns with influencers, was quite effective. This was partly due to the reach of the influencer, but also due to the video showing new users how to interact with these new voice skill and chatbot interfaces.

Integrating with backends

Depending on the use case, the chatbot, voice assistant, or IVR may need to interact with one or more back-end systems for authentication, user profile information, content, transactions, and more.

Important aspects to keep in mind include authentication, data compliance, and location of the back-end systems. There are a variety of ways to handle authentication including two-factor-authentication (2FA), cookies, or account linking. Depending on the type of data accessed, compliance rules like GDRP, CCPA, HIPAA, or SOC2 may be relevant. There are additional tools that can be used for PII redaction as well, like redact-pii. Where the data or back-end systems are located is also important — are they in the cloud, on-premise, or hybrid? How to integrate with the systems securely, while maintaining industry compliance is important to keep in mind.

Testing, monitoring, and measurement

Conversational AI development is an iterative process that relies on testing, monitoring, and measuring in a continuous feedback loop.

Testing prior to launch

Visually mapping out conversational flows can help in testing the experience before the chatbot or voice assistant is even built. Tools like Voiceflow can help with the design and planning.

For voice interfaces, a method that can help is to have one person act like the device and the other person the user, and speak aloud the queries and responses. This can help give an idea how well the responses flow, as well as whether they are to the point.

There are automated and crowdsourced tools for testing chatbots and voice assistants as well, like Bespoken and PulseLabs.

It is also important to test the NLU models. Analyzing the Intents and training phrases using Semantic Similarity clustering can help identify potential collisions and overlaps. For example, if you have very similar training phrases in more than one Intent, that could cause a collision. Services like PiRobot can help identify these issues.

Monitor mishandled and unhandled Intents

Once the chatbot or voice assistant is live, it is important to monitor the mishandled and unhandled Intents to improve the overall response effectiveness.

Semantic Similarity clustering can be applied across actual, user utterances and messages to see where the NLP may be breaking down. For example, in the image above, all the messages are effectively the same based on the semantic meaning — users are checking their order status. However, in some cases the messages are going to the correct Intent, “order status,” in other cases the messages are hitting the fallback / “I don’t know” Intent, and in other cases, the messages are mapped to an unrelated Intent, “search.”

These are opportunities to improve the NLU model by adding or moving training phrases to optimize the response effectiveness. For example, in the image above, “what’s the ETA on my order,” can be added as a training phrase to the “order status” Intent.

If the Semantic Similarity cluster analysis shows a whole cluster of user messages hitting the fallback, that cluster may be a candidate for a new Intent to add.

Identify common themes in Intents and messages

Examining the top Intents and messages can help identify common themes — why users are interacting and what they are asking about. Even if the chatbot does not use an NLP engine, the messages can be clustered for Semantic Similarity to identify common groupings or themes.

For example, I analyzed data across chatbots from late December 2019 through April 2021 and found users were chatting about Covid in late January 2020, before it became more well known. The interactions were occurring in a variety of use cases including chatbots for travel, insurance, HR, and more, as users were trying to learn more about the virus and the implications it would have on their lives and activities.

While examining the top messages and Intents can help identify themes and popular use cases, looking at the bottom ones can help identify issues as well. As mentioned earlier, one agency building voice skills dove deeper into the Intents that were hardly being triggered, and realized they were ones that were more complex with multiple entities. They reduced the complexity, which in turn increased the access to those areas of the voice skill.

Dive deeper through transcripts

Transcripts can provide deeper insights, and clarity, into the context of the user interactions.

For example, in the image below, we can see the path that led to escalation. The chatbot was not set up to understand or respond appropriately to the user’s request for help in this way.

Measure customer satisfaction

Enterprises are building chatbots not just to reduce costs and improve operational efficiencies, but to provide a better overall experience.

Customer satisfaction can be measured through sentiment analysis or direct questions, like a CSAT score.

Sentiment analysis, the measurement of the positiveness or negativeness of the user messages, can be an indicator of customer satisfaction. The more positive the overall sentiment, the more likely the users were satisfied and vice versa. There is one caveat however, in that for some customer service interactions, users may already have a negative sentiment to start, hence the outreach, and it may be more important to look at the change in sentiment over the interaction.

As mentioned previously, chatbots can be a great way to prompt the user for feedback during the conversation. For example, asking if the response was helpful or not. If the chatbot does this, examine the user responses to understand if the users are satisfied or not.

Monitor and measure conversions and escalations

It is important to track and measure the underlying KPIs and goals of the chatbot, as well as the paths that lead to success or drop off, as well as escalation.

Monitoring the paths can help better understand user behavior, and improve the conversational flow to increase conversations and reduce drop offs or escalations.

The goals depend on the chatbot and could be getting jobs done, getting people past where they are stuck, getting the user the information they need, completing a transaction, or anything else.

In the example above, we can see the successful path to checking order status, as well as common paths that lead to escalation. From here, one could dive deeper into related transcripts to get more insights into these escalations to see if there are areas for improvement.


Developing conversational AI interfaces is an iterative process. It is important to take the data and insights gathered and feed those back into the use cases, the NLU model, the conversation flows, and the back-end integrations, and continuously iterate.

Conversational AI is the future. While it can be a challenge to develop given the free-form communication and unstructured data, you can follow an iterative process to build compelling, high quality, highly effective chatbots and voice assistants.

Arte Merritt is an executive and entrepreneur in the Conversational AI, chatbot, and voice assistant space. He is currently leading Conversational AI partnerships at AWS.



Arte Merritt

Conversational AI & Generative AI Entrepreneur; Founder of Reconify; Former Conversational AI partnerships at AWS; Former CEO/Co-founder Dashbot

Recommended from Medium


See more recommendations