Speak and Type; Giving hints to get info to the system faster

My devices can understand me more every day (and not just in a “Her” way!) I want them to listen to me while I communicate with them in other ways. I am really excited when input modalities become AND vs. OR.

3 min readFeb 10, 2014

What do I mean here? Right now, when I use voice options on my device they are very single tasked. I am typing something in my IM client and can hit a button to say “let me talk to you now” and then I am able to talk to the system, and when “Done” a spinner eventually (hopefully) gets replaced by text that makes some sense. I can type, or I can speak. I jump between the two (e.g. edit something that came back) but this isn’t a nice bidirectional communication between the different input modes.

I want to be able to talk and type *at the same time*. Why would I want to do this? Here are some use cases:

Changing modes so I can tap less to get what I want in the text

E.g. If I don’t want auto-correct on, let me quickly disable it with “auto-correct off”. If I say this before typing “foo” it can stay with that rather than correcting to “for”.

I say “caps” mid sentence and get “year I KNOW” without having to go into caps mode via the keyboard.

Another version of modes is making other selections such as changing the color of a “paintbrush” by saying “red” while your finger keeps tap-painting.

Hinting to get better translation

E.g. I say some words while typing, especially common words that have this annoying feature: two common words, with letter next to each other on the keyboard. If I say “dad” but when typing I hit closer to “sad”, the system can know “I heard the ‘d’ so lets go with Dad!”

Choosing options

With keyboards such as SwiftKey, and Fleksy, I end up typing very differently. It becomes a lot more about making choices based on what they are guessing is coming up. Good stuff, but I find it can also be annoying to be jumping from the letter to the options.

Let’s take SwiftKey and the three options. What if when typing I could make a choice be saying either: the word directly “Hey” or “Left”/“Middle”/“Right” or “1”, “2”, “3”.

Gaming

Then you get into other niches such as gaming. While running around it would be nice to be able to say “change weapon axe”, “switch gear 2nd” and the like.

Will this work?

I am curious to see if this mixed mode would work well. We have seen it happen in other parts of our controls.

In computing we have mice and keyboards. Often we do these in single task mode but think about running around in a first person shooter.

Outside of computing, when you are driving you are doing many things at once, including having your feet on pedals to control speed, while your hands steer.

Speaking of driving, I remember that “expert” driving classes involve you talking through your actions, and that you can get clarity from talking, so I wonder what effects that will have?

Won’t it be socially awkward?

In a vacuum this can work well, but what about when you are out in the real world? I use Siri a lot more when alone than when around other people. Not only is it still strange to talk into your phone, but it also happens to work a lot better when you are in an enclosed space without other noises around to distract the system.

I wonder if this fact will mean that by not always doing the talking, you won’t ingrain the habit, and thus will focus more on getting better with taping vs. mixed mode hints.

Speed matters

I can’t wait to see more and more experiments with mixed mode here. I use a keyboard less and less and through better keyboards I can become pretty efficient, but now and then it is still frustrating to be in a situation where I want to get the info into the system faster than I can.

Maybe we can switch to a direct brain system where we just have to make sense of the craziness of the input?

What do you think?