Looking at how people text
How can we make the bot experience feel more natural
In this nascent automated messaging space, we think that one sentence booking is the holy grail of efficiency, spending hours and hours trying to understand the sentence structure and identify values which will make your query successful.
“I want to book a room in San Francisco on Monday for 3 nights.”
Leveraging natural language processing (N.L.P.) to be able to identify the city, the check-in and check-out date is what everyone working on.
But is it how people really text?
While I believe that understanding a full sentence lays the foundation of understanding, the way people interact with a bot can vary from one full sentence to groups of snippets, rectifying typos or even changing their mind.
The way most bots are built is to process every input to trigger another question or action within the same experience.
If you are a conversational developer, you are probably familiar with how a message is received from the user, processed and sent back. Without going too deep in technical terms, all messaging platforms allow the developer to receive the content of each message (via a webhook).
One input, one command. Not always true
The first mistake developers or product people make is to process each input as a command to trigger an action from the bot. After all, we are used to entering a command to a computer and expect something to happen? More interesting, this is not how a normal conversation unfolds between 2 people.
In any (civilized, not angry) conversation, people would wait until the other party is done talking/texting before responding. Empathic listening (also called active listening or reflective listening) is a way of listening and responding to another person that improves mutual understanding and trust.
So why not apply this to how a bot can understand?
As you can see below, we find that people don’t write long paragraphs, they write in group of snippets. When a user texts in groups of snippets, the bot should interpret them as one query instead of 3.
This example also shows that treating 3 inputs as separate query does not help with context either. Additionally, if you build your bot to ask a series of questions in a certain order it will not work appropriately. This is why tree flows don’t work.
But now what?
How do you code the act of listening?
To make sure the user is done “texting” or “talking” we have to introduce the concept of waiting on hold while acknowledging we are still listening for more input.
One UI artifact most messaging platforms provide is the typing indicator. It sounds simple and unsurprising, but this little element makes the user understand that we are processing something as opposed to “I don’t know if the bot actually works or understands.”
While the user is still typing a text, we show the typing indicator to acknowledge that something is on hold and processing. If the typing stops, and we receive a message, we append it to the previous text received.
current query [‘book’, ‘San Francisco’] + appending [‘3 nights’, ‘Monday’]
On the other hand, if we don’t receive anything, we wait an idle time (decided by our machine learning engine) to make sure we don’t receive another message and process the query as a group.
Taking it a step further: Fixing typos
While thinking about this concept of listening, we looked at how many times people re-send another text right after they send a text with a typo. Fixing a typo also implies that you want to replace an existing value with a new one.
Because we are already identifying a date to be a check-in date we know that the user wants to replace the previously entered date.
Keeping our mind open
Thinking towards building better conversational experiences while focusing on user empathy opens the conversation to other small but effective improvements.
Looking at our current experiences, we are working on implementing these simple but impactful improvements. This will give us an opportunity at changing the way we interact with a machine.