Alexa is not for your pet cyborg to reply “I’ll be back”
There is man, machine and combinations of interaction between them.
- Man to Man — evolved over millennia, is learnt, manifests as languages (spoken, expression and symbols). It is a high bandwidth (quick exchange), of medium accuracy (participants can distort information), consumes low power (food) and delivers very versatile content (feelings, instructions, wisdom, subtleties, reason, manipulation, bias etc).
- Man to machine — evolved over decades, is a discrete (binary) instruction set and manifests as clicks, buttons, keys, switches, taps, touches. Is of low bandwidth, high accuracy (hardcoded), consumes high power (servers) and delivers versatility (picture, text, voice, signals/states/flags).
- Machine to machine — (2) made this possible almost instantly (figuratively).
Why the rush to make a machine comprehend a human language?, I ask. It needs high power and an extra-large training set. What fun is it to talk to a machine in your language, unless you like them say “I’ll be back”.
The inherent aspect of discovery while we interface with the Internet has made the search engine the front page of the Internet. Internet sites (consumer, travel, financial, blogs etc) spend on SEO/ ASO (mind numbingly stupid work, but works) to please Google, so it lets you feature in its search result set. Incumbents pay Google a ton of money to appear at the top.
The fight for voice is the fight for a new interface to a subset of actions on the Internet. A large part of shopping for instance is fully determined (very little incremental discovery needed) like buying the cheapest ticket to Bangalore or buying a carton of Coke. Or turning off lights at home. Amazon potentially has the largest training set for voice instructions (customer calls).