Toward User Interface and Experience Guidelines for Virtual, Augmented and Mixed Realities

In the last few weeks I’ve been trying to find evidence of why we use certain Virtual and Mixed Reality gestures. The taxonomy we have for Multi-touch gestures came out of R&D at Microsoft in the mid 2000s and you can easily read about the studies conducted there. Through my research, I have yet to find empirical grounding for the in-air gestures currently seen in VR, AR and MR applications. Theoretically they make sense, but whenever I ask “Where did these gestures come from?” online, the only answer I’m given is “Did you see Minority Report?” Given that my own research involves the cognitive function of gestures and by extension visuospatial working memory and embodied cognition, I thought I’d explore what we know about spatial cognition, awareness, navigation and body perception and how we may use this knowledge to design the best interactions for virtual and augmented spaces.

Note: Much of these concepts are inspired by the work of Jeff Johnson and his book Designing with the Mind in Mind.

Existing User Interface Guidelines

We have had user interface guidelines for as long as we’ve have the graphical user interface. Even for those who aren’t familiar with all of these ‘rules’ as most of them are common sense for designers, we need to be mindful of aware of them as we move forward.

Most modern UI Guidelines focus on these topics: Consistency, Providing Informative Feedback, Universal Recognition of Affordances, Assistance with Errors, Empowerment of the User, Flexibility in Task completion, Minimalist design, and Providing Status Updates. (From Scheiderman, 1987; Scheiderman & Pleasant, 2009; Neilsen & Molich, 1990).

SIDE NOTE: For a good critique of Apple’s current design practice see this article by Don Norman and Bruce Tognazzini, which discusses minimalism and design aesthetic as a priority over usability.

Most of these design guidelines will work within the paradigm of presenting information on the screen, but with AR and VR, this focus on presentation of information suddenly seems so much smaller than what we will be designing for in the future. Instead of a small space within our field of vision, we now have a full 360 degree environment to design for, and instead of presenting a screen that we can interact with, we are now tasked with essentially embodying the user in 3D space. These responsibilities are User Interface and User Experience design on a whole new level, a level which presents many new and interesting challenges.

As far as the existing guidelines are concerned, anybody who works in any sort of design or development space knows that these guidelines are the Mount Olympus of standards. We can never reach them all simultaneously, but we can only meet as many as we can while mediating the priority of all the others. It’s a trade off, and this must also be true for A.R., V.R. and Mixed Reality.

The truth is that most of the above-mentioned concepts for UI and UX guidelines still apply, we’re now just dealing with an added layer of complexity in how our users will interact with the platforms we create.

A Usability Leap of Faith

The ways in which we interact with technology have largely to do with our previous experience in the world. With touch-based technology such as a tablet, phablet, phone or even desktop computer, the extension from our previous experience is not that far.

Your computer screen? The one you control with your mouse? Well forget the mouse, just touch the screen. Your finger is now the mouse. DONE!

As far as usability goes, this was how we adapted. Sure there were UI Design and convention changes. Buttons got bigger to accommodate our fingers, hover links no longer applied, swipes and pans were added, but largely touch-screen technology was an easy transition for us to get our ape-like brains around. Finally, the surface we wanted to touch started to respond to it.

The transition was made easier by continuing to use real world metaphors. Icons remained largely the same, and virtual keyboards acted as stand-ins for real keyboards. We still touched a key and the letter appeared on the screen, so nothing to write home about.

For VR, AR, and MR though, this is a completely different beast, and I do mean BEAST. While before, we have designed for screens that exist in the real world, now we are tasked with designing the world itself, which is no small undertaking. Along with the added dimension, there is a plethora of issues we must be mindful of, when it comes to human perception, awareness of body and the relationship between the two.

Grounding UX in Previous Experience

All User Experience Design must take into account the user’s previous experience both in the technology world and in the real world. This is where our use of metaphor originated. Including icons for physical phones, calculators, paper calendars all help the user to bridge the gap between analog and digital worlds. This is where the concept of universal recognition of affordances comes from. If your users don’t immediately understand what an icon means, you should probably use another icon.

I’ve written a little about the rules that govern the worlds we inhabit in the past, and this is important when it comes to how we present reality, and how users interact with the realities we create.

While it’s great to come up with User Interface Design Rules, Usability or Accessibility Rules, when working in the realm of VR, if we ignore the rules that govern the world that our users live in we’re setting them up to fail.

There’s a lot we already know about perception and action in the environment. In a nutshell, we have multiple systems working together to form a cognitive “picture” of reality in our mind. This allows us to understand our environment, and where we exist within it. This includes using visual, auditory, tactile and own peri-personal sense (how our body and body parts are oriented) to situate ourselves in the world.

Unfortunately, unlike in The Matrix, the rules of how we know the world to work cannot be changed . We, unfortunately cannot jump from one building to the next. Or can we?

In the real world, we cannot manipulate solid objects that have no solidity, and we cannot touch something that is beyond our reach, so this is where we must start, with the simple truth that even though the user might see and hear a virtual world, they still exist in the real one.

The obvious thing to also address before we dive in is this massive elephant in the room: For the first time, User Experience Designers have to start thinking about safety as part of their practice. As designers for AR, VR and MR, we must ensure that no physical or psychological harm comes to our users, no matter what their level of expertise in a virtual space may be.

Direct Object Manipulation and FAIL…

Hands vs. Controllers

The current landscape of interaction in VR exists on a spectrum between seamless real world hand tracking and virtual controllers, with most of the latter able to track hand location by proxy. There are limitations to all these approaches. First, we currently have no reliable technology to track hands without controllers consistently. The Leap Motion controller does a good job, but it can’t detect when a finger or hand collides with an object in 3D space (e.g. for detecting taps), as it’s just a fancy camera + IR setup. Second, controllers such as the Playstation Move or Oculus Touch controllers are merely a proxy for the hands, so we’re in a strange intermediate stage.

Over the last 10 years we’ve gotten used to being able to use our hands to interact with our technology, but now we’re seeing the growth of a platform that doesn’t allow us to. As technology advances, I have no doubt that these issues will be resolved, but in the meantime, we as designers should be thinking about the issues that lead to good experiences, not tied to what specific technologies are available to us.

The Guidelines

As it stands now, Virtual, Augmented and Mixed Realities are still in their infancy, so we have much work to do in figuring out the best approach.

To start, let’s look at how our brains construct reality.

Ground the experience in a reality your users are already familiar with

Present a virtual world that is compatible with our innate understanding of the real world.

How humans perceive and move through their surroundings has evolved over millions of years and it’s not something that we can easily re-learn or unlearn. We know which way our bodies are oriented, even if we can’t see where we are. We know where our limbs and extremities are, even if we can’t see them, and we innately understand the basics of physics, including gravity, momentum, inertia, volume and weight. We perceive boundaries in our environment just by seeing them (thanks to our binocular vision), and navigate through it using mental maps (see the work of Neil Burgess for more). We are tied to our environment, because our survival has always depended on it, and our environment is constructed through the information our senses feed us, including visual, auditory, touch, temperature, pressure, and the orientation of our head, hands and feet.

Understanding how our brains situate themselves in a world through our innate knowledge about it may be leveraged in AR and VR design. Scale, blurring and shadow and can present distance and location and Key-stoning and distortion can present orientation. These strategies have been used in First Person Shooters (FPS) and other 3D games for decades, and designers can leverage these patterns of recognition to make presentations of information in 3D space easily understandable because they align with reality.

The moment we present something that doesn’t align with reality, we take our users out of the world we’ve created for them. If they try to touch an object that appears within their reach and they can’t feel its volume, texture and weight, the world we’ve created will start to crumble.

*For more on visual perception in 3D space, check out The Ecological Approach to Visual Perception by James Gibson.

Avoid distorting the user’s perception of their own body

BBC: Is Seeing Believing?

Humans have an uncanny understanding of their own bodies, how they move and how they are oriented. Any virtual representations of the user’s body should also align with the real world, including orientation and movement. The above video demonstrates a concept called ‘Brain Plasticity’ whereby a person viewing a hand that is perceived to be their own can become mentally substituted for their real hand. Essentially, by priming someone to believe that a hand they see is real (when it may not be), their brains adjust and assume that is it, leading them to flinch when it’s hit with a hammer. In virtual reality, designers need to be mindful of this concept because it could lead to a misunderstanding of the user’s hand location or orientation, leading to movements that are out of sync with their intentions.

Be Mindful of Movement

Over the last few years, there have been many cases where VR users have thrown up. This is not unexpected due to the issues discussed above, but more so due to movement with in the virtual world. Movement has physical effects that we come to expect, and when we don’t feel these effects when they’re aligned with visual input, this can be disorienting, leading to vertigo and motion sickness. Part of what anchors us to our experience in the real world and motion within it, is a reference to our own bodies, which is why some researchers have explored the use of virtual noses when in VR to mediate motion sickness.

As a result, being mindful of the movement you require of your users and the level of cognitive disconnect this presents in virtual reality should be a priority. The difference between simulating walking, vs sitting in a moving vehicle will also present different experiences related to the user’s expectation of what it should feel like. Driving a virtual car and moving through a virtual city while seated, for example, would induce less of a cognitive disconnect, but showing the user quickly floating and spinning through asteroid tunnels, while allowing them to look down human feet that are rotated backwards, might not lead to a positive experience.

Direct vs. Indirect Manipulation

When you think about interactions and manipulations in VR and AR, we see two separate types of interactions: Indirect and Direct. Current implementations may appear ambiguous to users as to how they interact with virtual objects, so differentiating between these types of objects and ensuring that users understand how to interact with them should be a priority.

Direct Manipulation: Objects must be Real and Within reach

Direct manipulation refers to the use of our hands to directly tap on an object and manipulate it in our hands with different gestures, affordances and feedback. On a 2D screen, this takes the form of tap, swipe and scrolling gestures.

So what about VR, AR, and MR? The tendency to create 3D objects in a virtual space, and have users directly manipulate them presents a few challenges. First of all, the object isn’t there, so there is nothing for the user to actually pick up, even though their brains tell them there is: We see a box on a table, we move our hand to grab it and have our hand go through it, and proceed to fall through the table face first onto the floor. This is not an ideal user experience.

As a rule, any action should provide feedback that aligns with real-world experience, including feedback, so if you want the user to pick up a box that is within their reach, that box should actually exist by use of AR Object markers or something similar. Early Microsoft Surface tech demos featured something similar to this technology, but more recent research has termed it ‘Blended interaction’, that is, the blending of real world objects with virtual ones.

The easier option for presenting virtual objects for direct manipulation is simply to present 2D interactive surfaces in 3D space on an existing 2D surface. In other words, just map a 2D interface to a real table or wall and allow your users to tap, swipe or perform other 2D gestures on it.

The only prerequisite for presenting this sort of information is that the objects be within reach. A user cannot touch a virtual screen or pick up a box that is 10 meters away in the real world, so they shouldn’t be able to in the virtual one.

Now here comes the challenging part. Direct manipulation was an advancement made when we moved from regular screens to multitouch. All we needed to interact with our phones and tablets was our hands and fingers. Now, as technology advances to virtual realities, we have to take a step back, because there’s no such thing as a touch-sensitive surface in VR…yet.

Indirect Manipulation: Objects must be out of reach, or acted on with a Proxy Controller

Indirect Manipulation refers to the use of a tool or hands to act upon an object without actually touching the object. The perfect example of this is the use of a mouse to control a cursor on-screen. Because we are not actually touching and manipulating an object on the screen, we expect something different from that experience. Using a Leap Motion controller is also Indirect Manipulation, because our hands in space are moving objects on a 2D screen.

Of course, Minority Report

Everyone loves the UX design in Minority Report, but why is it so good? As you can see in the above GIF, Tom Cruise isn’t engaging in direct manipulation with those images. They are not physical photos, nor are they presented as such, so he is interacting with them as if they are floating in space, because they are.

This is where much of current VR and AR UX misses the mark, because something is presented as floating in space and users are expected to directly manipulate it, either with their hands, or by a proxy controller, only to have the user’s hand pass through it, with a possible fall as a further reward for the attempt. In the case of Minority Report, the set of gestures being used is quite different because the user knows they are not objects for him to touch directly, so he uses gestures to select, expand and move items around the virtual space with this understanding in mind (Tony Stark also has an understanding of this interface in the Iron Man Films). The key to this understanding? These objects are out of his reach, so he has to perform different gestures to manipulate them and the use of gloves isn’t that necessary. For the technology we currently have, gloves, or controllers would work just fine.

While these gestures are starting to appear, as with the in-air taps shown in Microsoft Hololens demonstrations, I would argue that more intuitive gestures should follow, ones that can delight and empower users, just like in Minority Report. Want to discard a file? Throw it away. Want to paint-fill a virtual motorcycle? Point to the section you want to paint and use your hand like a brush. Want to move a virtual screen further away? Just push it.

As gesture recognition technologies develop, I have no doubt that gesture learning technologies will develop with them. As we interact with virtual worlds more and more, we’ll need gestures that go beyond tap, swipe and pinch to zoom, so developing systems that will not only learn to recognize our gestures, but to catalog and categorize them will be just as important.

When controllers are used, hand position should be presented as congruent with the real world, ideally with the same finger placement, so that the user equates their virtual hands with their own. If the user is gripping the controller like a sword, that’s how the hands should be presented to the user in VR, and they should rotate and move just like the real hands are moving. In no way should a hand be presented in VR that doesn’t match the real world (for example showing a hand with outstretched fingers, when the users fingers are clasped around a controller).

Semi-congruent Hand Tracking

If you must bend the rules of reality, train your users and give them time to adjust.

Inevitably, as Virtual Reality and AR and MR become more ubiquitous, the rules of reality will want to be bent or broken, presenting entertainment experiences that bring thrills and chills to our users. This is fine, but a core principle of doing so should be rooted in training users for this essentially new reality.

If you design a game where the user is able to extend their arms just like Inspector Gadget, the experience should first be grounded in reality, with the user’s hands presented as normal. Previous research has shown that mimicry while learning is actually rooted in a brain system called the Mirror Neuron System (MNS). This system provides an evolutionary shortcut to the replication of gestures and actions through observing them in other people. To learn about what their “Go Go Gadget Arms” are capable of in a safe and accessible way, the best way to introduce this would be to have a non playable character (NPC) in the virtual world demonstrate how to active that them, allowing the user to explore their own virtual body’s based on mimicry.

Bending the laws of physics, time and space would also need a safe and appropriate demonstration and training phase before users are thrust into a reality that is unfamiliar to them.


As we move forward with designing great experiences for 3D Virtual, Augmented and Mixed reality applications, its important that we ground the worlds we build in a virtual space in the rules that govern the world we live in. In presenting affordances that are universally recognized, leveraging our innate knowledge of the world in which we live is the truest shortcut to accomplish this.

When dealing with interactions and object manipulation the same must also be true, and separating what is within our reach, from what we can actually interact with are decisions that we as designers can make.

So here are the guidelines again. To be clear, I am not saying these are final, nor are they law, but merely presented as a path forward, and a path towards more discussion and innovation.

  1. Present a virtual world that is compatible with our innate understanding of the real world.
  2. Avoid distorting the user’s perception of their own body
  3. Be Mindful of Movement
  4. For Direct Manipulation, Objects must be Real and Within reach
  5. For Indirect Manipulation, Objects must be out of reach, or acted on with a Proxy Controller
  6. If you must bend the rules of reality, train your users and give them time to adjust.


Buccino, G., Binkofski, F., & Riggio, L. (2004). The mirror neuron system and action recognition. Brain and Language, 89(2), 370–376.

Burgess, N., Maguire, E. A., & O’Keefe, J. (2002). The human hippocampus and spatial and episodic memory. Neuron, 35(4), 625–641.

Jetter, H.-C., Reiterer, H., & Geyer, F. (2013). Blended Interaction: understanding natural human–computer interaction in post-WIMP interactive spaces. Personal and Ubiquitous Computing, 18(5), 1139–1158.

Morris, M. J., Wobbrock, J. O., & Wilson, A. D. (2008). User-defined gesture set for surface computing. US Patent Office.

Pouw, W. T. J. L., Paas, F., & Van Gog, T. (2014). An Embedded and Embodied Cognition Review of Instructional Manipulatives. Educational Psychology Review, 26(1), 51–72.

Rizzolatti, G., & Craighero, L. (2004). The mirror-neuron system. Annual Review of Neuroscience, 27(1), 169–192.