Task-oriented and information-oriented behaviours are different
There is a specific, and in my view gigantic, gap between task-oriented behaviours and information-oriented behaviours on the Internet.
Granted, they frequently happen in the same interface – you gather information from a buzzfeed article, you complete the task of responding to it by clicking the heart icon immediately underneath. The actions are interdependent but still discrete.
(I’m playing a little bit loose with the term ‘information’ here – I’m using it as a placeholder for the wash of images and text usually swept up under the infinitely more brutal ‘content’. ‘Content’, in my view, is a word that accurately describes what the stuff is with no regard for what it does or why people seek it out – it’s like describing food as ‘matter’, it’s technically true but doesn’t tell you anything useful.)
There are two kinds of task-oriented behaviour
- Tasks where we give instructions (eg setting a ten minute timer)
- Tasks where we take instructions (eg a satnav telling us to turn left, or a line in a recipe telling us to add eggs)
There are two kinds of information-oriented behaviour
- Information gathering (like you, reading this article now)
- Information creation (like me, writing it)
How you do both task behaviours and information behaviours with machines will eventually boil down to which is faster for you, not the machine
For generating words, speaking is just plain faster than typing. For giving task-orientated instructions, or creating text information like this (yes, I voice dictated most of this article)(with laryngitis no less), voice is faster.
But the majority of people who are literate can read at far faster speeds then they can listen at. Engaging with text-based information visually also lets you to skim across large quantities of information and dive in where you need to, something that the enforced linear structure of spoken word does not allow you to do.
On a related but separate note, heard information is explicitly restricted to human languages and culturally determined sound effects (a timer sounds different to a phone ringing) – we cannot interpret a picture of an adorable hedgehog using sound.
Determining whether sound or vision will result in a faster interaction when human language is the primary means of communication is dependent on what kind of interaction it is:
- Task/giving instructions: probably speaking
- Task/taking instructions: probably reading
- Information/gathering information: probably reading
- Information/creating information: probably speaking
We have always used voice
There is absolutely no question in my mind that voice interfaces will be the next wave of gigantic change in how people interact with machines.
We will probably in retrospect look back at the era of keyboards, mice, and even early touchscreens as a kind of hiatus, a break in an otherwise continuous chain of human communication in which voice communication was always the foundation, and text and imagery (neither of which, it should be noted, are culturally universal) are supplementary.
My expectation is that we will look at the integration of voice into our interactions with machines as a reversion, back to the world as it was before the universalisation of computers, when most human communication was voice based, and text and images were only used for the tasks and information behaviours (especially information storage) that voice didn’t work for.
This article is in response to http://www.niemanlab.org/2017/09/the-future-of-news-is-humans-talking-to-machines/, which I recommend you read.