On Designing Bots

Transactions and conversations

Why is it that when I need to call the bank, make an appointment, or use any sort of automated agent I end up wanting to throw my phone out the window? Yet when I ask Siri a question and she goofs it’s just, “Ohhh Siri, aren’t you a silly… Youuuu.” In one scenario I find myself enunciating better than I have in my life and speaking louder than I would to my partially deaf Grandmother. “C-H-E-CKING.” In the other I see Siri has again autocorrected “pizza!!” to “USA!!”. Delightful.

So what’s up with that? In one scenario I strengthen my relationship with a brand. In the other I swear them off like Champagne after a New Year’s hangover.

From an implementation standpoint I know logically that voice recognition is far from perfect. I know that most speech recognition services don’t include the “Texan” dialect yet y’all. I also realize that background noise makes voice recognition difficult and that timing with speech services can be a little funny. Still, even with that level of understanding there’s quite a polarizing experience. This could cause serious negative sentiment towards a brand. That’s kind of a big deal.

First let’s consider user intent. There are two primary scenarios: conversational and transactional.

Transactional is what Nils Nilsson might sometimes categorizes in the “weak AI” (Nils is sorta like an OG of AI NBD) or AI-easy. Think of something that is really good in one single domain but totally breaks outside of that. Like a computer playing chess. It can’t tell you the weather, but it might capture your queen in 3 moves.

Weak AI could also be a simple mapped path of questions and responses called a decision tree. The system takes users down a little trail. It’s the “choose your own adventure” of AI.

If you don’t remember this you’re too young.

So for example, the system asks a question or presents a “node.” Once you pick an option or “input” the system is going to classify your response and move you to the next node. And so on and so forth until you either complete your adventure or what we might call a transaction.

The “Cave of Time” decision tree and all possible nodes/inputs. Party.

The good news is that when engaging with transactional bots users are pretty aware of what’s in the realm of possibility. They might not know it’s “transactional” per say, but users aren’t going to ask a banking app to bring them a taco. I mean. I suppose they could. But they might not get too far. Limited selections and options guide your responses. It’s a small task. You’re in. You’re out. Voila. Transaction done.

Then there’s conversational. Now that’s a bit trickier. We’re a long long loooong way off from Strong AI, but it’s possible we could be flirting with Turing test territory here. Sure you can make it very obvious to users that they’re engaging with a system or bot, but the context is more open ended. You could ask anything. Understanding the limitations of natural language processing can help guide the design process of conversational bots. For instance:

Asking even, “How’s the weather tomorrow” assumes the system knows:

  • Where you are currently
  • Where you’ll be tomorrow
  • That you want the weather of where you are and not in Tahiti.

In another scenario asking “Who’s the President,” assumes the system knows:

  • You mean of the United States
  • You mean at the present time here and now as opposed to the President of IBM 80 years ago.
Just one of many famous conversational bots and an example of Strong AI. Also, a parody/example.

There’s more context at play. Things get trickier. It’s more fun that way.

So let’s take a look at an example that mixes both conversational and transactional:

So we’ve got a mix of transactional and conversational. Yet, we’re guiding the user to success. I know exactly what’s expected of me. There’s visual and text clues. We’re keeping Mr. Nielsen happy (sorta a big deal in UX. Not to be confused with Nilsson previously mentioned). We’re minding those heuristics pretty well. Hooray, Internet.

The majority of bots I’ve seen are primarily transactional with conversational elements. Siri for instance uses a conversation format, but she’s trained me that I only have a limited number of options or transactions to choose from.

Every now and then I’ll go off script and ask her something inappropriate. This is engaging with purely a conversational aspect. When she answers successfully it’s surprising and delightful. This is because my expectations of Siri is that most of the time I have a purely transactional purpose and just want to set alarms and reminders. Simple. Two seconds.

Those utterly amaaaaazing automated customer service bots I so dearly love on the other hand are also transactional, but they leave me lost. I have a specific task I want to complete but navigating the choose-your-own-adventure of the ISP help desk is mind numbing. I have no idea what’s expected of me but I know I have a task to complete.

So, we as people that care about the experiences of our users and creating products that matter should be aware of our users intent before they even engage with a bot. We should set them up for success. Guide their intentions, test, and iterate to ensure that what they set out to do is completed as expected in the time expected.

What are bots that you think achieve transactional or conversational success? Is there perhaps a third category? And finally, how else can we improve our users experiences with these categories in mind? Stay tuned for the next post on designing better bots.