How the BBC, Financial Times, and Bayerischer Rundfunk are experimenting with voice interfaces
We talked AI-powered voice interfaces with Mukul Devichand, voice editor at the BBC; Chris Gathercole, head of FTLabs at the Financial Times; and Christian Daubner, head of digital strategy at Bayerischer Rundfunk.
According to Peter Hodgson (also Google), when mobile phones first came out, people gathered in little groups outside train stations to discreetly make phone calls (much like post-indoor-smoking-ban-smokers who huddle outside pubs). Over time, chatting on mobiles in public became socially acceptable to the point that we now leave our phones next to our plates in restaurants.
The same thing will probably happen with AI-driven voice devices: a few years from now, we might be walking down the street making commands into thin air without anyone batting an eyelid.
There are already 400 million Google assistant devices across the globe right now. Amazon is cagey about its numbers, but they claim to have sold ‘millions’ of Alexa devices over the Black Friday weekend alone. How many AI-driven voice assistants will there be in 10 years? And what does this mean for the news industry? Currently, news organisations are using voice assistants to deliver podcasts, flash briefings, and news quizzes. How else can they leverage this technology? What formats work best on these devices? And what are the ethical implications of their use?
We caught up with Mukul Devichand, voice editor at the BBC; Chris Gathercole, head of FTLabs at the Financial Times; and Christian Daubner, head of digital strategy at Bayerischer Rundfunk (Munich-based public service radio and TV broadcaster) to find out how far along they are with AI-powered voice. We also listened intently at the Smart Voice Summit in Paris to see what’s going on in the industry as a whole.
1. Baby steps: Sort out your audio before you can voice
The general feeling at the Smart Voice Summit was that newsrooms should get cracking on voice so they don’t lose out like they did in the mobile revolution. But what do newsrooms need to do in order to get started?
According to Gathercole, for news organisations to actually use voice, they have to have their audio text to speech (TTS) sorted out first. And that’s not always easy. Gathercole explained that TTS works just about well enough for sentences, but is not quite there yet for entire articles that are full of nuance, humour, sarcasm, long lists, and poetry. Also, do people actually want to listen to entire news articles read aloud? Gathercole suggests that something more short and snappy, such as a summary, might to the trick instead.
The FT themselves have been using Amazon Polly to turn text into audio, a process which takes between one and three seconds and whose results are read out by Artificial Amy. But does this mean that a human podcaster is going to have her job taken away by a robot? According to Gathercole, after Lucy Kellaway, FT opinion column writer, heard that one of her pieces had been narrated by a robot voice, ‘she kind of lost it a bit, demanded it to be taken down, listened to it, and then made it the subject of her next column’. In the subsequent column, Kellaway wrote, ‘Listening to [Artificial Amy] is not like listening to a non-English speaker read aloud, but to someone without a brain, or heart, or sense of humour. Indeed her delivery is so poor that I do not even understand the column when she reads it — which is saying something given that I wrote it’.
2. Mix humans and machines
Seeing as Kellaway’s second column was published on the FT website, it was also directly read out by Amy. ‘It was a very pleasing surreal experience to listen to a robot voice describing that this human was not going to be replaced by a robot any time soon’, said Gathercole at the Smart Voice Summit.
While Amy has a nice voice, learns quickly, and doesn’t cost much, it’ll be a little while before she can read out anything in a human-like way. The best compromise for audio, according to Gathercole, would therefore be to go hybrid: where a human voice reads out parts of a text and an artificial voice chips in with additional snippets of information. Audio plus video would also work.
3. Create a persona
When it comes to your voice assistant, it is better to create an integrated persona rather than being all things to all men, advised Hodgson at the Smart Voice Summit in Paris. According to him, when Siri first came out, Apple tried to make her (him/it?) as ‘neutral’ as possible. This strategy backfired, as users interpreted her behaviour as being ‘snarky’.
Alexa’s personality has since become more defined and according to this piece on Quartz, she is now a feminist.
‘If you ask Alexa whether she’s a feminist, she will say yes, adding “as is anyone who believes in bridging the inequality between men and women in society”. She’s also a supporter of diversity and social progressiveness within science and technology’.
A voice assistant should therefore have a backstory and a defined role that sits well within the brand.
The hosts of Quartz’s Alexa newcast, for example, are robots called Kendra and Brian. They read headlines in the same conversational and sometimes playful voice that Quartz fans are used to from the Quartz mobile app. Previously, Alexa read the Quartz Daily Brief, but the Quartz team found that mixing up voices and using more conversational writing worked better for the voice assistant.
‘Also, we really like the quality of the British man’s [Brian’s] voice. There’s been a lot of conversation here about whether it’s because he’s British. Do we, as speakers of US English, have more tolerance for robots speaking in an accent?’ wrote John Keefe from Quartz.
The FT Labs team is also actively thinking about their AI-voice brand persona and currently trying to build a chap called Alfred.
‘We’re not trying to build a friend, we’re trying to build a tool. You don’t talk to Alfred; he is quiet and hands you the information you want on a silver tray. He circles the important bits of the newspaper that you need to read’, said Gathercole at the Smart Voice Summit.
Alfred will not become anyone’s pal and no FT subscribers will ever be chatting to him like one. This will, in part, ensure that no user falls in love with Alfred and become condemned to a dystopian real-life ‘Her’-like situation. (You may laugh, but the anthropomorphisation of a device into a real entity is a real concern.) The FT Labs team is therefore looking at voice as a way of controlling a tool, rather than as a way of creating a chatty companion.
Alfred represents just one of the possible capabilities that the FT hope to achieve with voice. ‘It is a very distant hope. There are lots of complex challenges to identify and tackle before this becomes (maybe) a reality’, wrote Gathercole in our exchange of emails.
4. Write your dialogue
‘Dialogue and conversation is all’, pointed out Hodgson at the Smart Voice Summit. ‘Create sample dialogues and act them out with each other’.
Written dialogue and spoken dialogue are often very different, so reading the dialogue out loud is a good way of knowing if and where your voice assistant’s script is going wrong. According to Hodgson, script writers or people with script writing experience are therefore massively useful in a team working on AI-powered voice assistants.
The next step after finishing the sample dialogues is setting up an effective repair strategy. According to Hodgson, people don’t like voice interfaces all that much, because when voice interfaces go off tack, people blame themselves, feel angry, awkward, and confused. 62 per cent of the angry, awkward, and confused will give their device another go, while the rest will give up and go back to reading newspapers. Voice interfaces therefore need good repair strategies to get people back on track.
5. What formats are the Financial Times, BBC, and Bayerischer Runkfunk experimenting with?
News briefings and podcasts
‘Like everyone else, we are trying to find out and have begun a process of creative experimentation around voice interactions’, wrote Devichand in our email exchange. The BBC already have an audio news briefing that goes across all major voice platforms and in December 2017, they launched an Alexa skill (voice-driven Alexa capabilities) bringing live radio channels and podcasts to UK audiences. They are currently also looking at this functionality for Google and Voice devices.
‘We saw really strong audience growth very fast, with well over a million unique browsers by January. This validates the idea that our audiences want to use voice to access the wonderful radio, news, music and so on that we already create. The next step will be looking at what native content might look like across key genres, bringing the creativity of our audio community to the interaction models of voice’, wrote Devichand.
Talking to the news
At BR, the team have been mostly trying to convert already existing content from radio and TV to make it accessible for voice devices.The next step for BR is to create journalistic dialogue through artificial intelligence, which is where, according to Daubner, it gets exciting.
Gathercole said that being able to have a dialogue with news opens up a lot of possibilities. By being able to ‘interrupt’ a news broadcast, the listener is able to find out more about a particular news item. This means the listener will evolve from being passive (simply hearing a regular news update, which is probably sufficient most of the time) to being active (adjusting the news update). But is this really necessary? Gathercole argued that it may be more convenient and more immediate to use voice — rather than a finger and a screen — to nudge audio news updates into different modes.
News quizzes and swear jars
So far, the team at FT Labs have built a news game for Google Home. The game focuses on people in the public eye that are frequently mentioned in FT articles, making use of the organisation’s rich name data sets.
On top of this, the team have experimented with a ‘swear jar’ feature for potential spoken comments on online articles. The aim of this feature is to improve the quality of comments by fining users who refuse to remove bad language. How does it work? Spoken comments are recorded through a voice user interface (VUI) with sentiment analysis and a built-in swear jar.
‘Obviously, since the swear jar project was done in two days for a hackathon, it was more about the concept than the full set of practicalities. That said, the demo has proved very useful in triggering internal interest in the whole idea of integrating with voice assistants. It also helped surface the challenges faced by the comments team in managing reader comments, which range from the inspirational to the hideous. Swearing is usually not a sign of good quality in a comment’, wrote Gathercole.
Entertainment and children
The BBC have published ‘The Inspection Chamber’ — an experimental drama from BBC R&D — on the Alexa platform last year. In 2018, they are looking to produce content for children, which is a core area for the BBC with its public service remit. But what effect will voice devices have on the development of children?
Stuart Heritage wrote about Alexa and his two year old son for the Guardian in 2017. ‘The concept of voice recognition is already hardwired into his being in a way that makes me slightly uncertain. We’ve always drilled politeness into him, making sure he remembers to say please and thank you, but I wonder what will happen when he realises he can get whatever he wants by barking demands at the cylinder in the corner?’
6. But with opportunities come problems
We’ve made a list of key things newsrooms need to consider when branching out into voice:
Customers don’t know what they want from voice assistants
Gathercole wrote that some customers clearly know they want podcasts. They understand the podcast model, it fits into their daily routine, and it gives them exactly what they’re looking for. But what do they want from a voice-controlled device?
He believes that there are a myriad of things the user might like and ask for using this nascent medium, but will the experience ever go further than a couple of exploratory verbal pokes? Nobody knows. It is therefore possible that voice is not really conducive to a good news experience. And it’s also possible that hugely complex, technically impressive capabilities will be crafted that nobody will ever be interested in using.
‘This will be an interesting, collaborative exploration of possibilities’, wrote Gathercole. ‘It will also, almost certainly, be necessary to build working prototypes of the systems in order for us to find out, and for the users to find out for themselves, if there is something worth pursuing’.
Brands risk being diluted
Gathercole says that no matter whether your voice assistant sounds like a human, robot, or a mixture of both, brands risk the loss of their identities.
‘By becoming a commodity and being aggregated on the voice assistants, news organisations lose the connection with their audience, and the value of their content plummets. This is a well-worn route’, said Gathercole.
Nobody in Australia can use voice devices
The many ‘uuuhmms’ and ‘aaahhhhhs’ break up the flow leading to user frustration.
Almost nobody in Germany can use voice devices
According to Daubner, language recognition technology in German is still in the developmental stages. For example, German people use many anglicisms, which language recognition technology has difficulty recognising within a German sentence.
You can’t interrupt them
According to Gathercole, interruptions are at the heart of how human conversation works. Interruptions allow you to find out more about a topic and provide a natural, rich, and loaded conversation. At the moment, voice assistants can’t do this.
Monetisation is ‘tricky’
While it is possible to add pre, mid, or post roll ads to audio, it doesn’t work well. According to Gathercole, ads are much more intrusive in audio than on web pages and it may be that users won’t accept that intrusion.
Gathercole says that for a subscription based service, such as the FT or Amazon’ prime, it is likely that voice will simply be another channel that will be included as part of the subscription, for probably no extra cost.
‘[Voice] is almost certainly another opportunity for news organisations to lose control of their content’, wrote Gathercole. ‘Handing over the daily briefings to the main voice assistant systems will (or already has) turned news into a valueless commodity. There is nothing there to monetise’.
Our hope is the value to the user will come from voice/audio being an effective extension of their use of the FT subscription, accessing all the richness there is on offer. And, ideally, we will be able to identify and take advantage of aspects of voice and audio that are better than simply reading a web page’, wrote Gathercole.
Privacy is a big worry
Trusha Barot, in an article on the Nieman Lab, wrote about the challenging ethical issues brought about by voice devices in the home. Information about users is more easily gathered through voice interactions than through normal web interactions, meaning that voice interfaces have the capacity to become ‘intelligent’ to the point that they can proactively offer ideas and suggestions. One day, they might even be able to tell what mood you’re in depending on your tone of voice.
And this is precisely what makes them useful. According to Gathercole, if voice assistants didn’t learn from secondary information about their users (for example, at what time of day they consume news, what they prefer listening to, how they speak), the voice assistant would (probably) offer a very poor user experience.
But where is the line between usefulness and stalking people in their own homes?
Gathercole agrees that GDPR is a big deal and assures us that at the FT, data privacy is being considered preemptively ‘at fine-grained levels of detail’ when thinking about all new projects.But as for voice assistants eavesdropping on homes and offices, he admits that it is outside of the FT’s control.
And that it is a big worry.
‘With HTTPS, we can be pretty clear that we are sharing our web content exclusively with an authenticated user. With voice assistants, we are sharing it with the voice assistant provider (Amazon, Google, etc), and whoever else is in earshot. Is that our responsibility?’, wondered Gathercole.
‘We have not yet ‘dealt’ with the these issues, but we are worrying about them. At the very least we will want to authenticate the user of the voice assistant with their FT subscription.
There is quite a bit of tension between the providers of the voice assistants wanting to retain control over their ‘walled gardens’ and our wishes to retain a tight connection with our subscribers’.
Mukul Devichand is voice editor at the BBC. Prior to this he was the editor at BBC News and worked as a journalist for BBC Radio Current Affairs. He studied Law at LSE and journalism at Columbia.
Chris Gathercole is the head of FT Labs at the Financial Times. He holds a Phd in artificial intelligence from the University of Edinburgh.
Christian Daubner is head of digital strategy at Bayerischer Rundfunk.