How the Google IO hairdresser call demo broke the internet and yet we are still under estimating voice
Smart speaker and voice-enabled devices such as Alexa, Siri and Google Assistant are great, with tens of millions of devices in use we are moving towards a voice first world but what is next?
Today, performing simple tasks such as adding an event to your calendar, setting a timer, call a friend or ordering a pizza work over 95% of the time, giving us convenience and more time in our day. You can order replacement toilet rolls today using Amazon without ever picking up a phone or touching a laptop computer.
Though if you have ever tried to have a conversation with your voice assistant after 2 to 3 sentences things can start to go wrong and our voice assistant gets confused. We are just in the early days of these experiences and have to be patience, we all remember the Nokia 3210 and can see how far things have come since those days.
Google recently showed a demo video at Google IO (skip to 1 hour 55 minute mark) where Google Assistant made a call on behalf of a person to a physical hairdressers and book an appointment on behalf of a user, conversating with the hairdressers to book an appointment. The Internet then started to freak out as it does when these announcments are made, though this demo was impressive it was firstly a demo and secondly when you really think about it, the problem is a fairly easy task for the assistant to perform.
The question should be what happens when these assistants assist us before we even want to book the appointment and how would that work.
For us to get a complete assisted experience, we are going to have to move away from the wake word system.
What is an wake word?
If you have a smart speaker, it is always listening and is waiting to hear an wake word/phrase such as “Alexa” or “Hey Google”. As the device is waiting for this wake word it doesn’t send any data to the cloud, all processing is done on the device itself. Once you “wake” the device all audio sent after the wake word is sent to the cloud for processing and interpretation.
DON’T freak out; all your family conversations don’t have to be recorded and stored in the cloud for this to happen. There are several options for implementation, but three of the most likely are:
1) Adding more intelligence to the device itself, making your smart device even smarter and the ability for it to do more locally.
2) Adding a background listening mode where you must wake the device but can then place it into a listening mode for 10,30,60 minutes before it returns to only listening for just the wake word locally.
3) Devices are always listening, but data is evaluated in the cloud and deleted seconds later.
Now our device is listening, processing and understanding our conversation, how does our voice assistant interject into our conversations without it sounding strange. One route is the assistant doesn’t interject but must be asked. Imagine this dinner planning conversation.
“What should we have for Dinner”
- What should — Question
- We — more than one person, try to identify different voices
- Dinner — meal/food request — review what is in the Fridge, Freezer, local takeaways and delivery services
“I don’t want pizza”
- I don’t want — negative from the person speaking
- Pizza — food, remove from food options, log that speaking person doesn’t like pizza for future
“I am hungry now”
- I am — the person speaking
- Hungry now — reduce options to those that can be ready within 20 minutes
“I don’t have any cash on me”
- I don’t have — negative from the person speaking
- Any cash on me — remove options that require cash payments or prompt to ask others in the group if anyone has cash to pay
“Something that is not spicy”
- Not spicy — remove any spicy dishes
“What about chicken”
- What about — suggestion/idea
- Chicken — contact chicken places for delivery times based on recent orders, lookup chicken cooking time, it takes 15–20 minutes to cook, is there chicken in the fridge
“Voice Assistant what should we eat?”
You can get a chicken wrap from Chicken & Go charged to your account, and it can be here within 22 minutes, or there is a chicken in the fridge (which needs to be eaten by tomorrow) with peas from the freezer and rice in the cupboard.
Alternatively, our voice assistant can be participating in the conversation offering information and ideas. Let’s explore the process of booking a holiday over a weeks of interactions before booking.
“We need to start planning the summer holiday”
- We — more than one person
- Holiday — break away
- Summer — the time period
- Planning — at the start of the task
“Where should we go”
- Where — global options
VA — “Will it be for you both and the kids?” — “Yes”
- With kids — Less than 3 hours flying places
- Accommodation filter on kids club or family suitable
“Somewhere hot would be nice”
- Somewhere — filter time period to UK summer holiday of July — September
- Hot — identify UK average summer temperature + 20% warmer
“Half board or all-inclusive would ideal, so kids sorted”
- Half board or all-inclusive — filter accommodation by board status
- kids sorted — prioritise locations with kids clubs.
VA — “What is your group budget” — “£2k all in”
- Filter packages that are either half board with spending money buffer based on the country average spend
- use all budget for all-inclusive packages
“What about Spain or around there”
- Filter locations by Spain and available locations in neighbouring countries
VA — “How does Spain or the south of Italy sound?” — “great”
- Monitor these packages for price changes
…. A week later
“I checked with work, and we can go on the summer holiday either the first week of August or the last week”
- Update date range for the first week and last week of August
… A month later
“I think we should go for 2 weeks to Barcelona”
“Voice assistant, is Barcelona still possible, if so book it”
VA “holiday booked”
Would people book a holiday with a simple voice command of “book it”, good question and the answer is yes. It is just a matter of when this will happen, ten years ago, people would never think of thought you would buy a car, book a hotel, move money all from your mobile phone. Though not only do we, it is becoming the preferred and more convenient way to do so.
We will get more and more comfortable with voice purchasing, it starts with toilet roll before quickly moving to an evening takeaway, then a taxi, a hairdressers, maybe some concert tickets, a night away, the weeks food shopping, and before long the family summer holiday.
In the not too distance future, these types of conversations will become commonplace and will unlock many exciting possibilities. Though there are many years of development ahead of us including more humanistic voices, voice identification, emotional understanding, voice recognition accuracy and developer tools to build these complex conversational experiences.
What is voice like in your 2030 future?