When it comes to voice assistant apps*, writing IS design

In most digital organisations I’ve encountered, writers are highly regarded. Yet in product design they will often join the process next to last: to replace the ‘lorem ipsum’ assuming that what was said mattered, but maybe not as much as where it was said and in what colour. Whether my experience applies everywhere or whether this practice is effective is up for a — presumably juicy — debate, but I can say with confidence that it’s not a practice that will perform well in voice design. That’s because when it comes to voice, creative writing largely IS the design.

When I first started working on voice applications, we were a team consisting of myself — a product/project manager/experience designer/writer — and 3 programmers. I had staffed the team naively expecting the technical effort on such a project to be the greatest. Very soon I had to admit that I completely underestimated the effort design-writing takes, and what that means to the process of making a voice app.

Wordsmiths with superpowers wanted

After their first encounters with the magic of a talking cloud-computer (aka Alexa, Google Assistant, Siri, or others), many users have been somewhat discouraged by the quality of the (third-party) app experience. My two cents as to the reasons of that: it wasn’t only I who overstaffed the team with programmers. The general effort to build the ecosystem has been too busy on-boarding technical people, overseeing the significant challenge in the design, and resulting in solutions that may be technically impressive but are… difficult to love. I’m convinced, that including more creatively gifted team members will increase the quality of voice applications, even with the limitations they have.

“So bring the creatives in!” you will shout enthusiastically, tired of Alexa’s indifferent “I don’t know that”. Well, here’s the second challenge: she is right in some way — apart from few rare specimen this particular breed of writers doesn’t exist yet, but can be on-boarded relatively easily.

Let me describe what I think this person ideally should be able to do (in order of importance):

1 First and foremost you need to be able to write well and empathetically. It helps if you are specialised in dialogue-heavy writing, like movie scripts, comedy or plays. The writing superpower you need to develop is the ability to express an idea entertainingly in character with a one-breath-sentence that gives the listener clear instructions what to do next :)

2 Second, you need to understand the voice of the assistant you are writing for, and ways to tweak it. See it as writing for a particular actress/actor: after you know them very well, you will hear in your own head how they will say what you wrote, and how you can instruct them to say it.

3 Finally, you should develop a basic understanding of software design. The one writing should be the one in charge on how the conversation will flow. At the very least you might need to translate the dialogue logic into some form of a technical diagram, and at best design it with the technical opportunities and limitations in mind.

Discouraged yet? Don’t be — if I can do it, you can do it better.

Dare to imagine the new formats waiting to be discovered.

“Why bother?” — you might be wondering by now.

(Brace yourselves, here comes a dramatic argument) Dirk Platzek already discussed Marshall McLuhan, who once said that we try to understand a new medium by “living in the rear-view mirror”. Intuitively, we “speechify” content we know from other mediums to voice assistants.

So here’s a dare you might enjoy: instead of walking into the voice-interaction future backwards (another McLuhan reference :) ), let’s embrace an unprecedented opportunity to bring an imagined character alive. Or create products that solve problems better than by clicking virtual buttons and do so with an attitude. Or — my personal favourite — write a piece of interactive machine poetry.

Revising the process of writing for voice apps

I used a form of the word ‘write’ already 14 times in this article. The practice that we visualise with the word ‘write’, is where the misunderstanding starts (yet I fail to find a better word). When we say ‘write’ we picture someone expressing a thought from their head onto a wall in a tomb, a scroll of parchment, a piece of paper or a Word document, so that the reader can later visually process the idea we documented, i.e. read. With voice technology the reader will not be processing that thought visually, but will do so audibly. We will be writing words that someone else (a diva-esque robot-actress) will read out loud, and someone third will hear to process. So can we write without hearing what we wrote?

We can’t. At least not efficiently. Let me elaborate:

The very first voice project I worked on, I, the freshly baked voice designer, sketched a conversation for Alexa (Amazon’s voice assistant). I did it by graphically drawing the conversation flow between Alexa and a user, and filled it with what I considered to be entertaining text.

Next, I asked the 3 (!!!) extremely technically savvy people to implement it, which they — unsurprisingly in hindsight — did very fast.

Excited, I invoked the test application and to my disappointment it was the same robotic sound I had pledged to not do: it was too fast to understand, some words were mispronounced, the conversation did not flow, etc.

I changed things around and sent back something like this: “Well, [tiny pause] okay then [a tiny bit longer than tiny pause]. Would you like another surprise [it sounds like one word, can it sound like two?]?”

I remember going back and forth on the length of those pauses for quite a bit until the team proclaimed: this is not going to work. Not only was the teamwork less than ideal, I really needed to hear the words as I typed them. I also needed to be self-sufficient so that I would not interrupt my creative process.

Eventually, my team taught me how to set myself up so I can hear what I write, as well as some basic ways to play with the voice so that I could write, hear and tweak to perfection without them fantasising about my early death. It helped immensely and that was when I started seeing the creative potential of this platform.

I would like to pass these solutions on to you: my next piece will be about setting yourself up — how to hear what you write. In exchange, I’d love to hear about the apps you’ve created or test them out. If you’ve been writing for Alexa/voice assistants and found better ways to do what I describe — please drop me a message: I will make sure to try it out and write about it too.

*I mean skill for Amazon Alexa, action for Google or just voice app