What is the UI of AI, is it voice?
Have you heard about this AI thing, apparently AI is going to change all our lives, and in ways, we don’t even know yet. With the rise of AI-based technology and the automation of mundane tasks it will bring, we need to ask ourselves how will we interface with our technology and how will the AI interface with us.
Will we need a visual UI?
In an AI future where we don’t need to order things, and items turn up from services like Amazon Prime and Deliveroo, what becomes the role of traditional e-commerce. This is a future where brands have learnt our buying patterns, and our desires of quality, speed, price. They have used our data to predict when we run out of products or even better when we are about to. From toilet rolls to underwear to the weekly food shop, with this predictive commerce approach is there a need for a digital UI to do your shopping?
ASOS stocks 80,000 (Wikipedia) brand and own-brand products you can shop for, but when you have an event, you want to buy an outfit for, that 80,000 catalogue of products is reduced down to just 1,000 that are your style. Then filtering out for brands you like, it’s 300, availability, colour and you’re down to just eight products that you can choose.
Instead of you sitting with eight tabs open or swiping left and right deciding which to buy. ASOS can send you all the products identified, and you send back the ones you don’t want, and you are only charged for what you keep.
You can see this today if you use Gmail on your phone with smart replies, we have always had canned responses (your preformatted emails for your out-of-office are a great example). But now smart replies predict your response, based on the email you have been sent combined with the way you speak (How smart replies work).
I find myself using these more and more when on my phone as they do tend to sum up my response almost perfectly. Will we get to a point when we don’t need a text area to write an email back because the AI will offer us a set number of responses honed for the context and conversation we are having. Or we move to a situation where there is no text area to reply in because I can hear their message and speak to my AI with the single sentence “Let them know that is fine”. The AI will understand my language and tone of voice and construct a smart reply to send back.
Who speaks first
A lot of AI magic will happen in the background with us mere humans having to interject for key decisions to help signpost the AI down the correct path.
“Yes, send that reply”, “Ok, order two this time”, “Book a taxi every Monday night”.
Though with today’s voice assistants it means starting the conversation with “Hey Google” or “Hey Siri” or we let the AI initiate conversations with us; this may not be with spoken words of “Hey Lee” but could be as subtle as a chime. This will occur unprompted once the AI needs our input for a task to be completed or moved to the next step. Of course, we can set rules around how this happens to counter privacy concerns but to get things done efficiently, it is an inevitable outcome that we allow the AI to talk and engagement with us first.
When we start to talk with the AI, we can ask it one of a million different things. This means millions of different outcomes that makes confusion or misinterpretation more likely in our AI. Though if the AI is asking us it has the context to the question already and can even prime us to help jog our memory.
Lee, the flight you have booked next month to Dubai, do you want the chicken or fish on the flight out?
This is much easier than me on a Sunday night before the flight having to know to say
Is my flight to Dubai ok” or “Is there a meal option for my flight.
With these person led interactions the AI then has to ask several clarification questions such as which flight it is, which leg, is it for the return or outbound route, all this to get the correct context which if the AI initiates the conversation then it already has this context of us.
Short and sweet
Today there is frustration in the voice space from both creators and consumers of voice assistants and the voice assistants current inability to always understand what we are saying. With voice assistants where they are today the idea of a continuous conversation or anything more than a 3-prompt conversational flow is difficult to believe. Though with the growth in machine learning and speech recognition it is just a matter of time before we can converse with a digital assistant as we do our friends, though will we even need it this level of conversation depth for our AI?
As voice recognition gets near 99% accuracy, short sentences combined with hundreds of other triggers will give us enough context for the AI to get things done. What is going to matter is the context of how we answer, as that will provide a much greater understanding of what we are trying to say. Much like it is said that 80% of what you say is body language, in a voice world 80% of what you say is in how you say it.
A simple reply of “yes” can mean a lot of different things depending on the tone, sentiment, language, inflexions as well as situational awareness the conversation is happening within. Having all this context means the AI can perform our actions without us having to interact much at all.
Is this a good thing?
AI is coming and so is voice, these two-technologies combined provide us with a much more natural way to communicate with our digital world, and maybe a voice-powered AI world will result in us looking at the physical world around us a bit more rather than down at a screen.
Voice + AI, what do you think?