AI, Voice, and Emerging Technologies: A powerful fusion that’s making UI/UX designers rethink interaction design
“Alexa, will it rain today”? asks a concerned 7-year old Rita staring at the small speaker in front of her. She’s worried that the gathering clouds outside may play spoilsport with her family’s pizza plans. She heaves a sigh of relief after a calm female voice from the speaker replies — “The possibility of rain today is low”.
Alexa, the Artificial Intelligence (A.I) powered voice-assistant handles millions such queries from across the globe every day. It’s one among several such groundbreaking assistants like Siri (Apple), Cortana (Microsoft), or Bixby (Samsung) that are rapidly changing the way we interact with the world around us.
Devices and applications such as Alexa, powered by A.I are slowly but surely entrenching themselves in every aspect of our personal and digital lives. They have spun a gossamer web of unprecedented complexity since their humble beginnings a few years ago. Barely a few domains remain relatively untouched by A.I today. Its contemporary applications are wide-ranging and cover everything from entertainment, healthcare, manufacturing, software, to space technology and military applications.
You can now have a deep learning A.I algorithm churn out novels and song lyrics based on source material you teach it. There is Sophia, a humanoid looking robot built by Hanson Robotics which communicates using natural language and human-like facial expressions. Quite recently, Massachusetts General Hospital, teamed up with NVIDIA to build AI-powered systems to detect diseases and manages the diagnosis and treatment!
Innovative A.I Examples in the Industry
•Healthcare: Pager is an app developed by an NYC-based company that connects patients seamlessly to healthcare service providers via chat, voice, and video. The company uses AI to identify gaps in treatment by analyzing the patients clinical and claims data. It makes healthcare diagnosis and recommendations to help patients suffering from minor aches and pains until they can connect with a care provider.
• Education: Content technologies Inc., a leading AI development company has a suite of smart apps for secondary education and beyond. One of their popular apps Cram101 breaks down complex textbook content into digestible chunks that include summaries, quizzes, flash cards and a lot more. The AI behind the scenes intelligently identifies and disseminates content into easily manageable portions that have proven to be both effective as well as a practical approach to preparing for an examination.
• Entertainment: MIT’s latest AI outing can quickly identity and isolate different instruments from a recorded song. Trained on over 60 hours of videos, the ‘PixelPlayer’ system can identify specific instruments in a tune and extract the sounds that are associated with those instruments. It can also adjust the individual elements, remove them, or remix them in any way! Watch in in action here.
• Manufacturing: AI has ushered in an era of ‘Smart’ manufacturing. It’s integration into manufacturing processes and supply chains have considerably decreased manual intervention and cost of operations while significantly improving efficiency and productivity.
A few more examples from the last two years
2018 and 2019 were exciting years in the field of AI. AI excited the public imagination like never and showed us that some of the advancements which we foresaw in a few years are actually closer to home than we imagined.
A few highlights included:
• Google Duplex: An AI system for accomplishing real-world tasks over the phone. You can read more here.
• Deep fakes: AI can convincingly generate faces that don’t exist or morph a face into a video of someone else. Terrifying, but also exciting.
• Google’s Heart Disease Algorithm: Google’s advanced algorithm can apparently predict heart disease in a person by simply looking you in the eye. It’s incredible! Read more here.
• AI News Anchors: The world’s first news anchor, powered purely by AI went live in China. The AI anchor learns from broadcasting videos and can read text as naturally as a real person! Watch it in action here.
• MIT’s Ambitious plan for AI: MIT, the world’s leading technological institution has announced a $1 Billion plan create a new college which will focus purely on AI and other emerging technologies.
Designing Interactions for AI
The rapid internalization of AI into IoT and voice-powered AI devices has made User experience (UX) designers rethink what it means to engage a user to the full potential of the medium. The relevance of on-screen elements and buttons tend to become obsolete when thinking design for an interface which could, in the future be engaged purely by voice, audio, or human hand and facial gestures.
A recent development has been to design interactions that puts ‘voice’ or ‘sound’ at the center of the design. An acronym quickly gaining attention is the VUI.
What is the Voice User Interface (VUI)?
A VUI enables interaction between you and a computer or Mobile device solely by means of voice which could include spoken words or phrases. A VUI is fundamentally different because it may not require a visual interface at all, which is currently the norm. A voice command is used to trigger a response from the device. This response could be as simple as a light blinking or putting into motion a series of tasks.
Voice interaction could happen using a variety of connected devices such as:
• Wearable devices
• Stationary connected devices, such as desktops or smart appliances
• Non-stationary connected devices such as laptops and car audio systems
Design considerations for an AI system powered by voice:
• Mode of interaction: Identify if this is going to be the primary mode of interaction with the device. Rank the primary and secondary modes of interaction clearly to identify the use cases that can be used as benchmarks for conceptualization.
• Identify Technological Limitations: Voice engines require complex processing techniques which inevitably require strong computational power and connectivity. Factor this in before designing your VUI.
• Input Triggers: How is your user initializing the VUI? Is it a voice trigger, like “Ok Google”? By pressing a button? Or a motion trigger, such as waving in front of a device sensor or camera? This helps in determining the best approach for designing your VUI.
• Leading cues, Feedback, and Ending Cues: Your VUI should notify the user by means of a visual or auditory cue that it is listening for a voice command. It should also, upon receipt of a voice command acknowledge it with the user and repeat/paraphrase to ensure it’s what the user intended and allow corrective action in case of mistakes. Finally, as an ending cue, it must trigger an output action in line with the end goal of the user.
• Natural and Conversational: Your VUI won’t grab a user’s attention if its drab and monotonous. Make it as interactive, conversational and intuitive as possible. It also helps to add anthropomorphism principles that lend human-like traits to the AI, such as human-like voices and speech inflection.
AI and Augmented Reality and Virtual Reality (AR/VR) make great bedfellow
AR/VR is gaining steam as a powerful medium for mobile app developers to deliver immersive and contextual 3D experiences. This exciting and emerging tech enables the juxtaposition of 3D models and environments alongside the real world. Visual data is then used with along with gyroscopes, accelerometers, and other sensors to generate the AR/VR experience and track user movement within it. The principle technology employed here is Computer Vision (CV).
AI on the other hand is a distinct technology (specifically machine and deep learning) that can perform tasks based on training data fed into it that are generated using algorithms and statistical models.
Now, the lines between AR/VR and AI is blurring and a significant overlap is beginning between the otherwise disparate fields. Off late, AI and specifically deep learning models have exceled at some of the key factors necessary to deliver a fully immersive AR/VR experience.
Here are a few things AR/VR experience creators can leverage from AI:
• Object Detection: A frame from an AR scene is sent to an AI model which then estimates the position of an object within the scene and this data is used to enable ‘colliders’ to facilitate user interactions withing the AR/VR scene.
• Positional Tracking: AI models are becoming increasingly accurate in detecting vertical and horizontal planes and estimate depth within a scene. This enables developers to create 3D environments whose scale matches in proportion to the real world. This has been especially useful in overcoming motion or ‘simulator’ sickness with AR/VR experiences that result due to scenes being played at a frame rate that is slower or faster than what the human brain and eye can perceive as normal.
• Text Recognition and Translation: AI models can accurately detect text on an image or within an experience and translate it using Natural Language Processing (NLP) techniques.
• Audio Recognition and Interpretation: AI models are trained to ‘recognize’ words and auditory cues to trigger effects and interactions within an AR/VR experience. For example, If a user says the word “sword”, a 3D model of a sword appears in the user’s AR/VR scene.
- Core ML and TensorFlow: These API’s provide low-level input and output controls for 3D models which enable experience designers to trigger events within an AR/VR experience.
What does the future look like?
We would like to think that the future is filled with exciting new technology and shift in design ideas which lays emphasis on the human factor and actively engages with use to design tailor-made experiences.
AR/VR experiences will become commonplace with advancements in hardware and computer vision technologies aided by AI. Voice, gestures, and haptic devices will penetrate deeper into all domains causing traditional UI/UX methodologies to undergo a massive overhaul to meet user demands.
The User as the UI Craftsman
AI and Voice-enabled assistants will eventually tap into the human experience a whole lot deeper. This will usher in a whole new era of UI design. It’s an era where AI has been trained on thousands of human gestures, interactions, and natural language usage. So, we will begin to see AI build intelligent voice and non-voice UI systems on the fly and most importantly reinvent itself and aim for efficiency with every user feedback.
AI developers and UI/UX designers will need to join forces to develop a common platform that embodies the synergy of both fields. This will lead to co-creation of intelligent designs based on data analysis and user behavior studies.
We will see user interfaces both voice and non-voice grow more fluid and dynamic. The medium of interaction will become less consequential with increased emphasis on achieving a user goal in a linear or non-linear pattern.
As human leaning on technology grows every day, so does the time we spend with our devices. It’s highly probable that AI and specifically voice-based AI systems may soon become our primary means od interaction with all our connected devices.
We need to ponder on striking the right balance between the human factor and technology to alleviate current technological constraints and integrate smart design along the way.