Reflect, Critique and Document
In the voice-controlled future that I’ve explored, there will be a key shift in the way we interact with technology. While our current interactions with machines are largely visual and tactile, the dominance of audio feedback means earbuds will replace screens and our voices will replace our hands. The transfer of information will also become more seamless as invisible interfaces and voice-control take over, allowing calendars to be scheduled and purchases to be made without lifting a finger or tapping through a digital interface.
Initially, when I began to understand this shift in interaction, I was afraid my focus in visual design would become irrelevant. My training throughout school has been largely on and for screens, so how can I design for a world that doesn’t need or have screens? However, I soon realized that the different types of interactions (audio, visual, tactile) reinforce the given stimulus and for the most part there will always be a balance of these elements. The main question is how this balance will change. I speculate that while haptic interactions can be negated for voice-control, there will always be a need for visual aids. This change in balance means designers will have to think about interactions in a new way. Currently, visual interfaces inform users what actions to perform (swipe, tap, scroll) but in the future graphics will likely play a larger role in grounding audio information as audio is so temporal.
In a future where there may not be discrete screens that navigate to each other, developing user flows is still important. How people perceive the way they receive or deliver information is more important than how that information is actually transferred behind the scenes. Although users don’t necessarily want to know the exact logistics of how things get done, they still need a degree of knowledge to establish trust in the system. Because voice interactions and invisible interfaces allow things to be done without us going through the physical motions, it’s especially important to communicate the right information to the user.
For my project, I decided to focus on how a voice-controlled future will affect how we communicate with other people and machines. If the perfection of natural language processing allows us to communicate with machines on a very human level, how will personal relationships and conversations be affected? And if we can effortlessly engage audio spaces that immerse or distract us from our immediate surroundings, will this prevent us from building genuine connections with other people? Considering these side effects, I developed a visual aid to functionally illustrate our conversations with people and virtual assistants. My end product is largely a neutral tool that does not actively push users to have more engaging conversations with other people, but simply informs them of how the conversation is going. This allows people to be aware of their communications, without being intrusive. However, depending on how this future progresses, I think it is important for designers, as people, to take action and maintain the authenticity of human relationships in a world where technology and machines are so ubiquitous.
Experiential ‘Pocket’ of a Future: Visual Aids in an Audio World
As technology pushes towards invisible and voice-controlled interfaces, our interactions with our machines and each other will be greatly affected. When information and interactions are almost entirely in our ears, it will be much easier for people engage in audio realms different from their immediate surroundings by playing music, listening to podcasts, etc without others even knowing. This causes people to lose focus and disengage from their conversations with each other in the physical realm. Since audio is invisible, immersive, and individual (in the case of headphones), distractions will be much more inconspicuous than the current trend of people checking their phones in the middle of conversations. In this future, people will need to find a way to balance their communications with the people and machines in their lives. Based on this idea, I’ve developed VR-based visual aids that I imagine to be used during conversations in the future.
While talking to machines in the past required people to learn and adjust their speech models, the breakthrough in natural language processing allows for effortless communication with machines.
Learned interactions (tap, click, etc) will be replaced by natural speech. Keyboards are archaic and screens are no longer the most common interface. Interacting with technology and devices will feel like interacting with another person.
[Screens and devices are much less common, most controls are microphones.]
[Conversations with machines are conversational and seemingly friendly, but never intimate.]
Voicing frustration under breath is responded to (“did you say something?”).
Machine suggests actions similar to a friend (“you can call them.”)
The Verizon customer service bot seems and sounds human.
We converse with our TV/radio, asking it to turn down the volume or change the channel as if talking to a family member in control of the device (“can you turn it down?”).
Asking out loud is common habit (similar to voicing questions in a room of friends).
(?) Malfunction: Switch between different “assistants” or voices (I’m annoyed with “Steve”, I want to talk to “Lexi”). Frustration with voice, similar to frustration with friends/anyone you talk to (what makes people tick when conversing?) or does this even happen with robots?
(?) Machines detect user’s emotion through tone of voice, and adjusts responses.
(?) We act cordial towards robots, thanking them for services.
(?) What aspects of human speech are valuable enough to keep? (fillers, pauses, cordial)
Side-effects and side-shows
A New Age of Communication: Conversations with Machines
As machine language is perfected, communication and conversation will expand beyond person-to-person to encompass person-to-machine relationships as well.
“We show that WaveNets are able to generate speech which mimics any human voice and which sounds more natural than the best existing Text-to-Speech systems, reducing the gap with human performance by over 50%.”
(1) While experts work to train machines to sound “human”, will people naturally adapt to the mechanical way machines communicate as well?
(2) As conversation between people and machines becomes normalized, how will we prevent “Her”-like scenarios in which people become emotionally dependent on wavelengths that contain no true emotions or intentions? Is it dangerous to make machines sound too human?
(3) How will person-to-person interactions/communications be affected?
(4) As direct and efficient conversation becomes routine, will we loose certain aspects of conversation such as tone or filler speech (pauses, “umm”, “like”)? Will there be more or less miscommunication? Will machine communication lead to the eventual homogenization of language?
(5) Who controls the future of language? What aspects of human speech are worth keeping/incorporating into machine speech?
(1) People grow up in environments where personal assistance in the form of nannies, housekeepers, etc. is replaced by virtual assistants such as Cortana, Alexa, etc.
(2) Sarcasm and differences in tone are harder to interpret in daily conversations.
(3) People becoming sympathetic to machines.
(4) Machines assume a authoritative/assertive role, people begin to listen to machines.
(5) Human voices can be faked by machines. Political/corporate corruption. Everyday conversations can be manipulated to influence memory.
Observing Possible Micro-Futures
- Banter with robots.
Human communication is heavily dependent on discrete characteristics such as tone and nonverbal cues. We all know sarcasm is hard to interpret over text and messaging is not the best platform for in-depth arguments. But as virtual assistants such as Siri, Alexa and Cortana increase in popularity and people continue to converse on a digital basis through digital messaging, there is potential for more predictability in conversations. Communication of the future can be more efficient and direct.
2. Packaged empathy.
People’s individual perspectives are shaped by our distinct experiences and backgrounds. Today, we are able to experience each other’s lives more easily than ever through instant updates and live streaming. Because of this, we have access to environments and people that physical boundaries previously prevented. What does the ability to share each other’s experiences mean for the future of empathy and human understanding?
3. Moments relived.
Every time we go to a museum, attend a concert, or visit a national park, there will inevitably be a frenzy of people trying to permanently capture the moment on their phones, iPads or cameras. It has become typical that the more significant or authentic a stimulus, the more distant we become from it. Through technology, we are able to store memories for later, making the present less precious and ephemeral than ever before. Creating memories has become an active motion rather than a passive retrospect. People can now “relive” moments through drunk selfies and Snapchatted concerts, and experience memories in a different way.