Why your grandchildren will not dial customer service

Published in

Voice Tech Podcast

6 min readJun 26, 2019

At Solvemate we have built a software where our customers can create chatbots to automate customer service. Our bots deal exclusively with messaging interfaces such as chat widgets and Facebook Messenger. We do not support voice.

Quite often I get asked — “why?” Calling has been around since 1876; a phone number is an extremely common gateway to get help with everything from restaurant reservations to flight bookings; and nowadays bots like Alexa and Google Home are giving us even more ways to use our voice to get things done. If we support messaging, shouldn’t we should support voice as well? Is calling necessarily inferior to messaging for customer service?

I will answer these questions in this article. But first, we need to specify exactly what we are comparing.

The four interfaces

An interface (Wikipedia) is the point of exchange between the human and the computer. Basically, it is the medium you use to connect to a machine.

Let’s define different interfaces and what each is good for. Pictures say more than a thousand words, so here we go:

Obviously, the desktop interface is good for all our most complex and time-consuming tasks. It is very well-suited to office work as it has the most sophisticated UI and the biggest screen. At work I use two 27-inch screens to see even more things in parallel.

Coming to the mobile interface, we have significantly less screen “real estate” — but all UI elements are still possible. It’s not nearly as good for complex work as a desktop, but you can still manage the majority of your day-to-day tasks using a 4-inch screen and your two thumbs. Bookings, purchases, even basic research — we can easily do all of this from our mobile phones.

The messaging interface is mostly made up of speech bubbles as two or more people (or bots) exchange replies. It can also handle a few UI elements such as pictures, buttons, or links. But compared to a mobile app or website, it is much more limited in graphical design and the screen real estate is effectively limited to the size of a chat bubble.

Finally, we come to the voice interface. Here, there is no screen interface at all. There are no links to click; no pictures to look at; no videos to be played. The user cannot read what has been said and cannot easily navigate to earlier messages. This clearly limits the UI options of this interface.

Build better voice apps. Get more articles & interviews from voice technology experts at voicetechpodcast.com

The best interface

Certain interfaces enable certain use cases. For example, the mobile interface enables tracking sport activities or taking pictures on-the-go and uploading them to Instagram. The messaging interface enables you to efficiently communicate with a group of people or companies. Voice is faster than typing, so the voice interface lets you “just call someone” (as you may have done last time you caught yourself exchanging long e-mails with a colleague), or it could let you use a voice bot like Alexa or Google Home to add something to your shopping list while you are cooking.

The opposite also holds true; there are examples where a certain interface is obviously not compatible with certain use cases:

Can you imagine using Twitter via voice? Probably not.
Can you efficiently draft a 50-page document using mobile, messaging, or voice? No.
Can you use your laptop to track your hiking trip on Strava? No.
Can you most efficiently shop fashion using a messaging bot? Not really.

So:

Every interface has its best use cases.

Some applications work on multiple interfaces, but some clearly do not — or they work, but they are very inefficient. I am convinced that you can match any given task with one of four interfaces above by thinking about the nature of what you want to do (complex vs. simple work, multi-tasking vs. single-thread, stationary vs. on-the-go). In this way we can understand the best — and worst — use cases for each interface.

The crux: Messaging vs. Voice

Now that we’ve covered the concepts of interfaces and matching them to their best use cases, let’s go back to our main question: why do our bots stick exclusively to the messaging interface? Why don’t we support voice?

Messaging has some pros and cons:

➕ It is faster to read than to hear. The average person in the US speaks 150 words per minute; the average reading rate is 200–300 words per minute. Sidenote: When doing this test I scored ~500 wpm — try it yourself to see the difference in speed between talking and reading.

➕ We have access to UI elements such as pictures, formatting, lists, and buttons to structure the conversation.

➕ I can scroll back up and reread longer texts in case I need to recap information.

➕ I can be linked to other resources such as articles, websites, videos, etc.

➕ It is silent, which means I can use it anywhere — for example, in the office, in public transport, or in the waiting area of the doctor’s office.

➖ I need to use my hands.

➖ Typing is slower than talking. The average person types ~40 words per minute. Sidenote: This typing speed test is very fun and only takes 60 seconds. I scored ~70 wpm, which is still much slower than I can talk.

➖ I need to take my phone out of my pocket every time I want to interact.

The pros of voice are the cons of messaging:

➕ You do not need to take out your phone out of your pocket; you can just speak.

➕ Talking is faster than typing.

➕ Voice is hands-free. This means you can do many other things in parallel: cook, drive a car, walk around, cycle, etc.

➖ Voice is not silent; it is not appropriate (or at least not polite!) to talk / call in some situations.

➖ It is not possible to be given further information in the same way that you’d send someone a link or a document in a messaging app.

➖ You cannot “scroll back up” to recap information. It is possible to ask for something to be repeated, but you still need to remember what you hear.

➖ The UI elements of a screen do not exist as the entire interface is auditory.

➖ Hearing is slower than reading.

As you can see, a voice interface has opposite strengths compared to a messaging interface; voice is easy to use (hands-free) but sacrifices UI elements and functionality.

Here is my belief:

Users are fully rational when it comes to their time and effort. They use the interface that is most efficient to perform a certain task.

Voice is a new interface and it is great. For command-like instructions such as “turn off the light” or “play music” or “text ‘I love you’ to my girlfriend”, clearly voice will win — and I can think of hundreds more command-like tasks where voice will be the most convenient interface.

For other use cases, voice is not the optimal interface. And, because voice and messaging have opposite strengths, such use cases will always become dominated by messaging.

Customer service is one such example. Customer service is often done in silent environments (in an office, on-the-go) and is a multi-step conversation which sometimes relies on visual content such as pictures, GIFs, or explanatory videos. Plus, not to forget: we can read nearly twice as fast as we can speak or hear information.

Thus, my bet is that our grandchildren will think about voice for customer service like my kids nowadays think about the floppy disk: a floppy-what? 💾

Why your grandchildren will not dial customer service

The four interfaces

The best interface

The crux: Messaging vs. Voice

Written by Erik Pfannmöller