Voice UX: Busting The Myth To Save The Soul
The unplanned and reportedly rapid growth of Amazon Echo line of products, and the sequel of similar devices crafted by Google, Samsung, and a number of other brands during the last couple of years or so have generated a crescendo of apparent public enthusiasm for “Voice First” appliances.
The particular hype around the Amazon and Google voice gadgets has offered an unexpected spotlight to marketers and a number of self-styled specialists and voice-UX advocates to try making sense of this emerging consumer market segment. As of now, only a few usage-related surveys have surfaced with unknown methodology therefore with none or very limited value for any serious analytical consideration. Even the statistics related to the “actual” number of sold units by each vendor could not be used as an objective criterion of measurement given that their ad hoc leaks follow consumer market arousal tactics rather than target a desired and better public knowledge. Additionally, the substantial absence of a generally accepted conceptual framework for the specific field makes it difficult to read and interpret correctly the ongoing trends. Nor the promotional technology articles published almost on a daily basis do add actual insights given that they are often drafted by clueless staff writers. The ending result is the lack of a comprehensive analytical perspective, a greatly needed big picture.
The emerging reality
While awaiting the start of more systematic field studies, we can try to organize the current fragments of information by elaborating a few temporary useful concepts. Obviously, we assume that the reader is sufficiently familiar with Amazon Alexa and Google Home basic terminology.
In our view, a good starting point is the so-called “skills,” as they are called in Amazon’s platform jargon. According to the official sources, as of this writing, the “skills” could be defined as the platform’s expandable task-oriented capabilities that allow users to interact with the Alexa-enabled devices in a more intuitive way by using voice. Currently, Alexa’s feedback is mainly audio and only partially it supplements the information by using visual cards displayed inside its companion mobile App.
If we run an extensive analysis of the skills, we can observe essentially three high-level categories of voice-based user experience elaborations that in a number of cases present overlapping areas:
- Conventional Voice Command (CVC) skills with added improvement through Natural Language Processing (NLP)
- Interactive Custom Radio (ICR) skills that include audio pointcast of entertainment, news, economy, culture, education, sports, games, and other similar topics.
- Ambient Intelligence Gateway (AIG) skills that include all those capabilities that allow users to interact with their home physical environment.
Conventional Voice Command
Even if considered by itself, the Conventional Voice Command (CVC) category could still maintain a debatable relevance despite its large overlap with the other two categories. In fact, it seems a good idea to keep it as a marginal container to include all those skills that do not fall into the other categories. Additionally, this class of capabilities, even though in direct competition with the smartphones, could probably assume some importance over time — and perhaps mainly for certain demographics such as aging adults — as a home-based extra access point for Digital Assistants and Digital Advisors in a number of fields that would eventually include conversational e-Commerce, home-based healthcare, domestic legal advisory, personal and household calendar management, etc.
Interactive Custom Radio
The Interactive Custom Radio (ICR) category is not only the quantitatively predominant one among the skills (probably it forms more than 90% of the overall current skills corpus), it is also the only one that has displayed, and still showing a very fast growth rate until now. Such a performance seems to be consistent with the Amazon’s marketing goals of flooding the market segment, feeding the media hype and possibly wreaking havoc on any emerging competitions. Moreover, we may also consider other driving factors both on the developers’ and consumers’ end. Since it is an emerging platform, many developers understandably tend to focus on skills that are easier to design and implement both technically and in terms of available input feeds. All that also help to significantly shorten time to market (TTM) length. On the users’ side, the ICR skills often receive a rapid acceptance for they reconnect with the decades-long consumer’s habits of listening to traditional radio broadcasts.
Ambient Intelligence Gateway
The truly original and most interesting category with tremendous future possibilities is the Ambient Intelligence Gateway (AIG) class of skills. These skills are opening the way towards a Sentient Environment where sensing technologies, data processing and supporting middleware fuse to generate and maintain a representation of physical space in terms of a world model, allowing shared perception between computing devices and persons. To better contextualize the AIG category of capabilities, let’s imagine bottom-up the four abstract layers of any Sentient Environment:
- Hardware (HW)
- Firmware (FM)
- Software (SW)
- Cogniware (CW)
The 4th conceptual layer, that we may call Cogniware, is the one where the overall ambient embedded intelligence resides. With the development of the so-called “connected home,” (called also “smart home”, “home automation” and “domotics”) we are actually taking the very first and basic steps towards the establishment of a domestic Sentient Environment as described above. The AIG category of skills would belong specifically to the Cogniware layer.
In their current release, the AIG capabilities are almost exclusively limited to either direct or conditionally triggered (by using services such as IFTTT) voice commands. Compared to the old-fashioned Interactive Voice Response (IVR) model, the present use of Natural Language User Interface (NLUI) offers an increasing linguistic flexibility. However, we are still substantially in a voice-based equivalent of the decades old computing Command-Line Interface (CLI) phase. Nevertheless, the most important voice user experience’s intrinsic property, that is, invisibility, helps create a context somehow similar to what in the specialized literature is known as a Natural User Interface (NUI). The latter induces the feeling of an acquired “Shamanic” empowerment that would allow any user to connect to, and act upon the surroundings through spoken words.
The AIG skills have yet a long way to go before they mature and transform enough to merge into a Ubiquitous Access Layer (UAL) — that is, a maturing Mediated Reality ecosystem (MR) — and “dissolve” completely into the context of people’s daily life. Additionally, we still have to see how these skills will integrate with other developing interactivity models such as Gesture-Control, for instance. The future of unfolding Sentient Environment, particularly its Cogniware layer (Embedded Intelligence), appears definitely promising and wide open to exciting new developments. The current voice interactivity feature is only the first step in a long march on a rocky road full of hills and cliffs.