Multi-modal Interactions for Mixed Reality: An Overview

Jackson Barnes
Inborn Experience (UX in AR/VR)
9 min readFeb 12, 2018

Hello! In this article I will be covering the basics of different interaction modals for mixed reality interfaces (primarily for optical see through headsets like the Hololens, Meta 2, Magic Leap One and ODG) and how applying certain user experience design principles can make for a more enjoyable mixed reality experience for the user. Focusing on the four main types of interaction modals that are used widely today are covered in the first part. Last but not least I will review the future of interaction design with Brain Computer-Interface technology for augmented/mixed reality headsets!

These include:

  • Gaze — Referring to the act of looking upon specific points of interest in mixed reality environments.
  • Gesture — Users perform simple hand gestures to act upon interfaces in mixed reality.
  • Voice — Using short voice commands to control selected holographic UI elements.
  • Controller — Controllers are used as a tangible interface for manipulating holographic UI elements.

Other topics covered are Experimental Considerations for Interaction or (IxD) modals — looking into new ideas I have, or technology that could advance the field of mixed reality interaction design.

Paul Milgram’s reality–virtuality continuum

Gaze in mixed reality enables the head to act like a pivot for controlling a virtual cursor. Most head mounted displays use this as a core interaction method for selection of either an object or part of an interface. We use gaze to generalize our environments everyday focusing in on meaningful objects we pull relationships from. The same can be said in mixed reality to draw distinctions from how gaze can be used for good and bad. Designing gaze in mixed reality comes in various forms usually starting with identifying “affordances” for UI active/de-active elements. Make things jump, highlight, bounce or any other animation that affords the current context is best practice. Just like adding micro-interactions on a 2D UI, making 3D UI adds another level of complexity when dealing with a new preceptive (especially with any kind of animation software) so be aware not to overload the user. Legibility of text elements should be carefully placed within the users FOV making sure to avoid eye fatigue by placing text at comfortable distances ( More information about this here from Todd Williams). Avoid making the user look around for UI elements for an extended period of time. Keeping important items and relevant information closely together near or in the users FOV can prevent strain on the neck and other complications regarding way-finding/discovery in mixed reality. The more time the user is looking for important information the less efficient the experience will become.

The Goldilocks Zone

Experimental Considerations for IxD

Eye Tracking is the next level of gaze/visual interaction technology leading the way to more innovative interactions with user interfaces. We use our eyes to focus in on specific elements of interest on a 2D interface and then converge the mouse cursor to the exact point we wish to act upon. There are a number of companies that are focusing on this very tech for AR/VR but most are trying to fix the problems that come with having simulated stereoscopic 3D graphics, not interaction design possibilities.

Gesture control is another powerful interaction for mixed reality, many forms of gesture control can be found primarily in the Hololens and Meta 2 headset. Thankfully these two companies have different approaches when it comes to interacting with mixed reality interfaces — they are now subject to a small analysis. Having tried the Meta 2 and owning the Hololens I can safely say that the Meta 2 has a truly amazing start to something new in the IxD field but lacks the technology to make it work effectively (occlusion and tracking primarily). Hololens on the other hand provides simple gestures like Tap and Bloom that work coherently with camera sensors on the headset making for a more usable experience. Right now keeping gestures simple and fluid to accommodate new technology is best practice but later down the line the floodgates will open giving way to more dynamic gesture interactions as shown by Leap Motion, Meta 2 and Google Research project Soli. Below are some tips from Christophe Tauziet in his article Designing for Hands in VR.

  • “The higher the person has to raise their arm to perform an interaction, the faster that interaction should be in order to avoid fatigue.”
  • “A familiar object often communicates how it should be picked up, held and used. For example, people expect to use a gun-shaped prop to aim at things.”

Christophe Tauziet

Experimental Considerations for IxD

New developments of computer-vision software will help make gesture control for mixed reality more dynamic. Taking a closer look at real-time pose estimation algorithms from regular RGB cameras could be the next big enabling technology for skeletal hand tracking. Most headsets use infrared light to triangulate distance but this can take up more physical space and processing power within the HMD.

An idea I had for a new way of interacting with holograms — but goes against UX design norms of staying consistent within interaction modals is using sign-language to manipulate objects in 3D space. Just like we were taught how to type for text manipulation on a 2D screen maybe one day we will use a new form of sign-language to interact with holographic interfaces.

https://www.youtube.com/watch?v=Py0dqjShb10

Voice control for any kind of interface is gaining more popularity due to the advent of natural language processing and machine learning to understand complex linguistic structures. The past couple of years we have seen technology like Google Home and Amazon’s Echo making their way into peoples everyday live’s. Eventually, they will fully integrate with most head-mounted displays. The most notable headset with this feature is the Hololens letting the users take control and make selections with their voice. Designing Voice User Interfaces within mixed reality applications will need both auditory and visual affordances to work hand in hand for an ideal experience. Microsoft’s Hololens Guidelines calls the combination of gaze and voice a “voice dwell” prompting the user with a visual cue to give a short command for a specific action. Keeping things short and sweet for the user is ideal when considering implementing any type of voice control. No one will remember a long command for short interactions. Microsoft’s research shows that concise commands will have a better outcome for users if commands are limited to a couple of syllables per word like “Play Video” vs. “Play the Video”.

Controllers, oh controllers… at the forefront of the mixed reality headset race you can expect a controller from the majority of industry players, but what you don’t see are people actually using them. Controllers have been around for a while letting users engage with an analog interface for interacting with all types of displays (TV controllers being most notable and cumbersome ). What you do see are controllers being used for Immersive Virtual Reality — not Mixed Reality, the reason for this is due to the lack of content in mixed reality environments and not enough immersion to afford a 6DOF controller. “I am not of my user” (well kind of), but I have never used the clicker for the Hololens nor have I seen anyone else use it. Designing a compelling experience using controllers for mixed reality is a new territory but for immersive VR experiences check out this recent article on Designing VR Interactions at a Distance by Barrett Fox & Martin Schubert.

Experimental Considerations for IxD

I am currently developing an experimental interaction application that could limit the possibility of having the user carry a third device and making a new ergonomic workplace using a head mount display (HMD). The biggest limitation of mixed reality interaction design is the lack of tangibility for holographic user interfaces. I hope the simplicity of this product will shed light on interacting with mixed reality artifacts in the future and hope to have a working prototype in the next couple of months! My inspiration comes from the Godfather of Interaction Design, Bret Victor and his pioneering work with making interactions more humane. His video The Humane Representation of Thought is one of the greatest videos ever recorded on Interaction Design. Check out his project Dynamicland.

https://dynamicland.org/

To the future… AR-BCI

Welcome to the very end of interaction design… well, sort of… Alas, we have integrated electroencephalograms or EEGs to augmented/mixed reality headsets. All the ways to interact with something simulated in 3D or 2D is pre-configured and trained based on electrical signals that are being transmitted from the user’s brain via BCI. This is known as Augmented Reality Brain-Computer Interface or AR-BCI. Popular BCI methods and research have been around since the 70s but just recently they are picking up the pace due to the rise of artificial intelligence being able to decode brain signals into binary inputs for computers. The innovation surrounding BCI comes directly from the medical industry experimenting with invasive interfaces for mapping, augmenting and repairing human cognitive or sensory-motor functions (Check out the work of Dr. Miguel Nicolelis and his experiments with invasive BCI on monkeys). Oh… and Neuralink.

In the context of mixed reality interaction design, there is a specific case in which the use of a cursor for control of a 2D text-based interface via the patient’s brain sheds light on the endless possibilities in the field of Interaction Design for AR-BCI. Using Cyberkinetics’ BrainGate chip-implant, researchers at Emory University in Atlanta were able to give Johnny Ray — A patient with ‘locked-in syndrome’ control of a cursor for communicating with the world. Now just recently this type of technology is making leaps and bounds giving patients the ability to communicate up to 30 words per minute through text-based UIs. Looking in the direction of how things are moving it is not hard to see that as BCI chips are getting smaller and less invasive, technology of this kind will surely make it into head-mounted displays for controlling just about anything we think about. If the interaction of an interface does not need to be designed because of the cognitive control we have before bodily actions, will there be a need to innovate new ways of dealing with thought-enabled control vs. other multi-modal ways of controlling UIs — especially for mixed reality? Maybe. We have a long way to go before this technology will become integrated with any kind of HMD but the future does look bright.

Experimental Considerations for IxD

In the not so distant future, I am super excited to start experimenting with AR-BCI using the Muse headset and/or Emotiv for the ODG/Hololens. There are plenty of demos online that show the capabilities of using a dry/non-invasive EEG to control simple robotic commands. If I can just get a webpage to open or use gaze and BCI control and select objects in a mixed reality environment, I will be satisfied! Wish me luck!

To wrap up, fusing multi-modal interactions for mixed reality applications can benefit the user by giving more options to engage in new experiences. Limiting access to a preferred interaction might be disadvantageous if the current situation affords a multi-modal approach. Identify what your user needs in the given context to help progress ideas and influence design decisions.

Thank you.

--

--

Jackson Barnes
Inborn Experience (UX in AR/VR)

I’ve made my passion in life to leave behind a footprint that will have a positive impact on people’s lives