CEO Diary #5 Future of Work — Dictation

I use my phone every day, and over the years it has become the primary device I do business on. Most of my work is responding to emails, reviewing documents, preparing new presentations, and sharing ideas with people through different apps. About 5% of my work is designs and coding, which is something that I can’t currently do from my phone.
I only ever revert to using the computer for three reasons:
- To be able to use a keyboard to type something longer
- To have a larger screen to look at things like spreadsheets
- To do computer-specific things like designing and coding
The first can be taken care of simply by using a Bluetooth keyboard, which I wrote about here, and which improved my productivity dramatically.
For the second reason, something like an iPad sounds like a good solution.
The third need is harder to answer considering the current stage of development of mobile operating systems.
I found dictation to be a really fast method of input. It’s faster than a keyboard and improves in accuracy with every iteration of a mobile operating system. Which got me wondering: what would working on an iPad with a good dictation interface be like?
For the rest of this article let’s assume that dictation is a superior mode of input.
Problem
Currently, to be able to use dictation on iOS it is necessary for you to invoke Siri in a limited number of use cases, like sending a message with apps that have integrated support for Siri. Or you need to open up the keyboard, tap the dictation button, and speak.
There’s no simple systemwide solution that would enable you to dictate anything you want and put it into any input field you see on the screen.
So, what should the interface look like with an iPad centered around dictation?
Solution
Imagine an overlay that is always on top of the system, but visible only when you perform a certain action. That action can be a voice command or interacting with a screen in a certain way.
That interaction would have to be something that doesn’t interfere with the existing system gestures, like holding three fingers on the screen. Or saying something like “start dictation”.
When performing this action, the screen would go dark and show dictation results as you speak. When you’re done with dictation, you would be able to drag-and-drop the dictated text into any input field you see onscreen.
That way dictation would be independent of any keyboard that’s currently in use or any application.
As you can see in the video below, touching the screen with three fingers activates dictation, and then you can speak and see your words appear onscreen.
Real world demo
We put together a simple application that’s able to take the dictated text and display it in a list onscreen. You’re then able to take the listed item and drag-and-drop it into any other application that runs in the multi-window mode on the iPad. This is enabled by the iOS 11’s cross-application drag-and-drop feature. The application always lives on the right or left or as an overlay.
Privacy concerns
Dictation on the iPad comes with a caveat: recordings of all your audio are transmitted to Apple servers for processing. The data is anonymized and encrypted, but Apple still gets it.
Implications for office spaces
What I find intriguing is how office spaces will need to change to enable this kind of input. There are private messages that you wouldn’t want to say out loud, or messages that contain privileged information. An enclosed environment per person is one simple solution. This is, however, leaps and bounds away from current open space office plans, and in no way promotes teamwork.
You can argue that environments such as call centers already have multiple people talking over each other. But those are highly structured conversations that have a decision tree attached to them, and most of the private information is not said out loud, but only verified with the person on the other side of the call.
We can see this sort of interface working in the movie Her, where people sit quite far from each other in the office.
One way around these issues with dictation interfaces is possibly lip-reading technology that would allow you to dictate a text without actually producing a sound.
A more down-to-earth alternative would be noise masking for active noise cancellation areas to create personal private spaces.
Limitations
The application we created is limited by what iOS 11 enables at the moment. Apple also restricts the dictation API use per person and per app. Therefore, it is unlikely that it would be allowed in the app store because it mimics the native functionality. The true solution would have to be a systemwide rethinking of the operating system’s inputs.
