Revisiting Our First Technology
Moonshot has been exploring what Chris Milk called the last medium. Now we are turning (or returning) our attention to what Marshall McLuhan called humankind’s first technology, the spoken word.
The spoken word was the first technology by which man was able to let go of his environment in order to grasp it in a new way. Words are a kind of information retrieval that can range over the total environment and experience at high speed. Words are complex systems of metaphors and symbols that translate experience into our uttered or outered senses. They are a technology of explicitness. By means of translation of immediate sense experience into vocal symbols the entire world can be evoked and retrieved at any instant. ― Marshall McLuhan
Conversational Interfaces, which encompass voice technologies and their closely related cousins, text messaging technologies, are ascendant. They range from simple chatbots and supposedly smart speakers to seemingly SciFi products like Google’s instant-translation Pixel Buds and the amazing and chilling Lyrebird, artificial voice technology which can make your voice say anything. Together, these technologies promise to make a world where we can talk to everything and everything can talk to, and listen to, us.
The Conversational Interface Spectrum
In our explorations thus far, we’ve seen that the contours of this space seem to be laid out on a spectrum of conversational fluidity that runs from rigid, rule-based chatbots on the shallow end to fully conversational companions, like the fictional Her, on the not-yet-realized, deep end. And we have seen at least one more dimension to this space: conversational richness, ranging from text-only chat, to voice interactions, emotionally expressive avatars, and even to multi-media mashups that may produce new languages and entirely original symbolic constructs or at least help to bridge the barriers between existing languages.
The Cambrian Explosion of Chatbots
One of the most obvious manifestions of the Conversational Interfaces trend is the proliferation of chatbots. Bots abound on Twitter. There are trollbots, of course. There was the short-lived Microsoft Tay. Or if you prefer your bots a little more righteous, there’s the AImen bot. And there’s my favorite, the Magic Realism Bot (not really a chat-bot, but wonderful anyway).
Most of these chatbots are pretty utilitarian and fall low on a Turing scale of conversational fluidity. And while there are chatbots that can hold much more entertaining conversations, none are truly convincingly human yet. Still, some are human enough to break a heart, as Karen Faith will attest.
Golden Age of Communication?
As it turns out though, faking human conversation, aka the Turing Test, may not really be the ultimate measure of a chatbot’s fluency. Chat has become something that even Turing couldn’t have envisioned. It is a rich, multimedia conversation exchange platform. It isn’t limited to text, of course. It has grown to include emojis, GIFs, images, video, AR, voice and mashups of all of these, enriching our libraries of “metaphors and symbols” with which to translate experiences and evoke our world.
It may be that today’s message platforms and chatbots will usher in a golden age of an entirely new mashup conversations–a next way for humans and computers to communicate. Or it may be that we will just recreate flashier and worse versions of those infamous Interactive Voice Response phone system hells, where each of us will be trapped, endlessly rolling emojis through virtual loops, like a cyber Sisyphus.
At the very least though, messaging platforms seem poised to become a major interface category in their own right. Just as browsers disrupted the desktop, chat platforms may disrupt browsers, becoming robust content delivery, customer service, e-commerce, and customer experience ecosystems with commensurate consumer mindshare and media clout.
Smart Speakers and the Surveillance Economy
Chatbots aren’t the only example of Conversational Interfaces though. Smart speakers have infiltrated our homes at a remarkable rate. A survey from Morning Consult estimates that as many as 18.8 million Amazon Echoes and 15.7 million Google Home devices were in homes as of June of this year.
But despite their proliferation, smart speakers face challenges. They aren’t yet as smart or conversational as we’d like them to be. Their range of uses is limited right now. The vast majority of Alexa skills have one or zero customer reviews, and just 3% of an app’s users remain active the second week after beginning to use an app, according to VoiceLabs.
Additionally, many people have misgivings about privacy, legitimate misgivings as it turns out. Although Amazon has made every attempt to assuage consumer privacy concerns, there may be real legal consequences when we knowingly surrender our privacy by placing listening devices inside our houses. And then there is the question of who besides Alexa may be listening to us.
Concerns aside, it seems that many, maybe most of us, are willing to trade privacy for convenience. That’s the fundamental transaction underlying this new surveillance economy we seem to be creating. We welcome listening (and maybe watching and sensing) devices into our homes, in exchange for the promise greater personalization, service and convenience.
Turing Test Prep
Underpinning Conversational Interface technologies are various forms of Artificial Intelligence, particularly Natural Language Processing, Speech Recognition and Speech Synthesis. Although the letters “AI” get sprinkled pretty liberally on just about everything these days, especially in this space, most chatbots don’t require anything that could legitimately be called AI. Voice applications, on the other hand, rely heavily on AI. If you spend any time swimming in the deep end of the Conversational Interface pool, trying out Lyrebird or watching the Google Pixel Buds demo, it’s easy to be amazed at how far AI has taken voice tech. But listen to two Cleverbots talking to each other, and you’ll know that while AI excels at identifying words, it has work to do still in understanding meaning and context.
Intuitively it feels like learning to speak fluidly “human,” to pass the Turing test, would be a significant milestone for AI. But is it? What should we make of stories about AI inventing its own language (even if the stories were overhyped)? Or the fact that a Google AlphaGo AI that trained itself to play Go, without any human training, beat all of Google’s other AlphaGo AIs that had been human trained?
Maybe speaking human isn’t the end game for AI. But perhaps that’s best left for a future exploration.
Finding Our Voice
In the meantime, Moonshot’s exploration of Conversational Interfaces has already begun. Preston Richey built out our first proof-of-conversation, a chatbot named Carmen (after Carmen San Diego), who helps Barkley partners find their way around our building. We’ll share his report on creating her soon.
We’re also working on a new exhibit, under wraps for the time being, but it’ll be a space where people can try out a variety of conversational technologies and experiences. We’ve been using some of the tools and techniques from our VR exploration to plan out the experience, which has made the process a lot more engaging.
We’re working with Claudia Reeder of our strategy team to understand how brands are navigating this new space and where they are finding early successes. And very importantly, what happens to the idea of Brand Voice in an age when so many new things are literally being given a voice for the first time?
We’ve also been making friends with bots from all over the world at all hours of the day, and encouraging our friends to do the same. Karen has been doing research on the human side of human-bot relationships. Nobody’s fallen in love yet, but there have been some hurt feelings.
And we have several activities, experiences and surprises planned for the months ahead. Stay tuned here for details. We’ll keep talking to you.
People don’t talk like this, theytalklikethis. Syllables, words, sentences run together like a watercolor left in the rain. To understand what anyone is saying to us we must separate these noises into words and the words into sentences so that we might in our turn issue a stream of mixed sounds in response. If what we say is suitably apt and amusing, the listener will show his delight by emitting a series of uncontrolled high-pitched noises, accompanied by sharp intakes of breath of the sort normally associated with a seizure or heart failure. And by these means we converse. Talking, when you think about it, is a very strange business indeed. ― Bill Bryson