Why I changed my mind on Voice Interfaces — but won’t be building my own

James Whitman
9 min readDec 30, 2016

--

Being a total tech head, I usually find myself wanting to invest in new tech relatively early on in the adoption process and working in product I find it interesting when a product is heavily investing in delighter features (see the Kano Model) as I really want to try these features out and try new things. But I have just never been that fussed about voice interfaces.

I bought a 3D TV once they became affordable and I must have used the 3D capability a handful of times. I invested in a Smart TV when they first came along too, only its pretty dumb now as it cannot connect to any services and even when it did the apps regularly crashed. I could list the amount of things I’ve spent my hard earned money on only to find that after a while, the shine started to come off and I usually get stuck in the Trough of Disillusionment (my most recent one is smart-watches — maybe I will about that separately).

The Hype Cycle from Gartner

The problem was, some of these delighter features were not properly thought through and ended up disappointing. Luckily, in the case of my 3D TV, a feature I actually didn’t know it had and wasn’t advertised was that it hooks up nicely to everything else I own (having Samsung invested in performance needs) and the TV communicates both ways with pretty much anything over HDMI, so its not likely to get replaced anytime soon as I use those features every day.

I love the opportunity to invest in new things, but as I get older I find that the sensible boring side of me tells my brain to spend money on more sensible things.

But…

As with any slightly impulse driven geek, I ignored that sensible inner voice recently and bought myself an Amazon Echo Dot at the start of December. I thought it’d be handy to have another music speaker in the house and I could finally retire the old dated one I bought for the kitchen back in 2006.

I actually was not all that fussed about the voice control — I sometimes used voice control on my smartwatch whilst driving to call people but it was not a feature I purchased it for. Nonetheless, it was one of the additional features that made me chose to spend £50 instead of £40 on a similar audio quality bluetooth speaker.

When it turned up I did the typically geeky thing and used it solidly for about 3 hours, asking it to play music, tell me jokes and telling me the weather. I hooked it up to my calendar, my Uber account, my Nest, my Amazon account and most of the other services it can connect to that I use. It then went into the kitchen, hidden away above the kitchen units and became my music player for the kitchen — ready to be used for occasional music.

It was £50 I didn’t really need to spend, and I was starting to get that feeling I get after spending money on a new tech toy usually get “James — you could have spent that money elsewhere!” -I was headed for disillusionment again.

I have been looking at Conversational Interfaces for a while now and thinking about when to invest in one; the Product I manage is one that would work really nicely with it but I honestly thought that for the most part, it was just going to be another fad that would pass after a period of time and that we are better investing the products time elsewhere until it becomes clearer about the future of this type of interface.

I’ve always been on the bench about voice interfaces and was pretty much at a refusal point to invest in one. Personally I’ve always thought that we’d need a lot more investment in conversational interfaces as voice was an evolution of conversational interface.

But my experience with the dot at Christmas changed my mind and showed me that voice interfaces are not just an evolution of conversational ones and that they’re worth investing in sooner than I thought.

Why…

This Christmas I had the honour of hosting my family for Christmas. I live alone — and my girlfriend was spending the day with her family, so that meant whilst I was hosting I also had to cook and prepare a 3 course christmas dinner for my family on christmas day and generally look after people from the 24th until the 26th.

Typically on christmas day, i’d end up being quite a bad host having to cook everything along with keeping people topped up with drinks and snacks. This year I disappeared into the kitchen at about 11 and hoped that people would generally be OK.

Whilst in the kitchen, I told the dot to play me some christmas music, (I actually just said “Alexa, play me some christmas music”) and it promptly went off and found some for me to listen to. My grandparents then wanted music in the lounge, so I turned on the bluetooth speaker and told the dot to connect to it.

Instantly, the music I was playing in the kitchen started playing in the lounge. My Nan, bemused, walked in and asked whose voice it was on the speaker. I told her it was called Alexa, and she could ask it questions and to perform tasks so long as she prefaced it with the Alexa name.

What followed amazed me, my grandparents are hardly adopters of new technology, I’ve just about got my nan on a smartphone (even though the internet is turned off on it) but they spent the whole day setting the music for themselves (although Cliff Richard was played a few too many times that day because they had control).

I also used it numerous times during the day to help me to cook dinner. I told Alexa to set timers, add things to a to-do list and read my to-do list back to me.

It actually made hosting a total breeze because it automated some of my hosting duties (the entertainment) and freed my hands up by allowing me to use my voice whilst holding trays full of various parts of dinner.

But I still wasn’t convinced that voice interfaces were all that great though, that was one day I thought. It was quite useful and yes, I was impressed at the search capability within the service to pick up the music being chosen, but it hadn’t really changed my mind.

The following day, I’d slept in — probably from consuming too much port cheese the night before. My family were already awake and in the lounge but had gotten cold. My Nest hadn’t turned my heating on as I would usually be at work, but they had turned the heating on on my Nest via Alexa — again, making my job of host much easier. They were also listening to music again too.

The next day, my girlfriend starts using it to set timers for the oven whilst we’re cooking dinner together, and i’m starting to generally use it day to day too.

The interesting thing is, its actually slightly changed the way I interact with my other devices too.

When I get in my car, my car connects to the bluetooth system and with the current version of Android, when the screen is on it will allow you to use OK Google keyphrase to unlock and use the phone via voice. I have now started using voice in my car to set the navigation, find petrol stations or places i’m trying to find and play music off Spotify all whilst driving. Previously I would have stopped the car to find set a destination on navigation or choose some music on my phone — but voice just made it easier, meaning I could do it on the go.

So why has all this changed my mind about voice interfaces?

Look, voice is really great when it works, but its a total pain when it doesn’t. I’ve tried using voice previously on my ipad, smartwatch and phone for a while, but mostly as a party gimmick and often find it frustrating when it has not been able to understand me. Its certainly not been good enough in the past to be something i’d start to use day to day.

But as much of a first world problem as this is, we are suffering from a serious service overload at the moment. I don’t think I could list the amount of services that I am subscribed to. On a weekly basis I use hundreds of them to help simplify my life and make it a little nicer. From keeping track of where I need to be in my Google calendar — to listening to music over Spotify at my desk, at home and on the go. I use Uber for all my taxi rides on nights out with friends and I use Evernote to keep track of shopping and to-do lists.

But that is just a handful of the services I interact with and I often find I am dipping in and out of apps, websites, programs and devices to use them - sometimes I even forget where I’ve have to go to get to the service I want.

What Alexa showed me was how it easy it was for her to interact with these services and do the tasks for you. She’d work out which service was the one that would answer the query or request I gave her. I didn’t need to get my phone out and put Spotify on the bluetooth speaker, I also didn’t need to write down my oven times or set the timers on my oven or phone — Alexa did it all for me.

Since Christmas Day, I have used Alexa to call me a taxi on Uber, I have it read the weather to me as I’m putting my coat on before I head out so I know whether to get a scarf or umbrella and I am now using shopping lists, meaning I never actually run of stuff I normally run out of accidentally. It also means when I have guests, they’re fully in control of what music they’d like to listen to, making hosting people so much easier too!

I was wrong about voice interfaces, it isn’t just an evolution of conversational interfaces, its entirely different. Its an evolution in service interaction. Its essentially what natural language search did to search at large. Voice will allow you to interact with all your services through one easy to use service. Its essentially your services middleman.

But why won’t I be investing yet?

Fundamentally, the reason I won’t be investing yet is that from where I see it, there’s going to be a few big players and we’re going to plug into their services.

Sure, I could invest heavily into a conversational interface and allow you to interact with it by voice, but as far as I see it, a sounder investment would be holding out until a few major players become clear and then investing in integrating with them.

Maybe its just the products I manage and have managed in the past that make me feel this way and if I worked at Google I’d feel different (I’d definitely feel different…) but to me it seems a bit foolish to try and do it myself when I can use people who build products which are dedicated to voice to do it for me. I’m also hoping they’ll allow for easy API integration too, meaning I can piggyback onto a service a product offers.

Right now it’d be utterly pointless for me to invest in it from scratch, but if a really easy integration came along, and the products I manage can easily work alongside it— I’d be looking at what ROI I could possibly get from using it and putting it into my roadmap.

--

--

James Whitman

Product Manager & sometimes I write | I believe we can make great experiences and we'll get there with Tech | @whitmaan www.whitmaan.com