Rethinking interface assumptions in AR: selecting objects

Our unexpected results testing direct and indirect object selection in augmented reality

Published in

Google Play Apps & Games

8 min readOct 24, 2017

It’s a great time to be a developer — when a completely new technology is just peeking over the horizon; when anything is possible! Right now, we’re living through the birth of Augmented Reality (AR), the display of computer-generated imagery over video of the real world. Technologies like ARCore are bringing the power and magic of AR experiences to mobile phones, in the hands of millions.

With each new computing paradigm shift, it can take years to figure out how to make the most useful and compelling products with it, leveraging its strengths while designing for its weaknesses. AR is still in its infancy, and every week there are breakthroughs in user interface, development techniques, and content presentation.

This article is for beginner developers playing with AR, especially those interested in basic user interface research. We’ll explore one of the most fundamental user actions: selection.

The fundamentals of selection

The root of any graphical user interface (GUI) is the ability to indicate and select an element, such as a menu item, a block of text, or a button. AR is simply another GUI, so any AR app in which you are to interact with virtual elements will need to provide a smooth, intuitive selection mechanism.

Since it is such a fundamental task, selection has decades of research behind it. While exploring examples in other GUIs, we found that selection mechanisms come in two basic flavors:

Direct Selection — where there is no abstract cursor between the user and the item being selected. An example is using your finger to tap a button on a phone screen.
Indirect Selection — where an abstract pointer of some kind exists between the user and the object being selected. For example, using a mouse cursor on a computer screen.

Which is right for AR?

Well, as with all interfaces, there’s no objective definition of “right”. User experience (UX) and User Interface (UI) design are both art and science, a mix of what ‘feels’ right and what works well. We built a prototype to explore the differences — both subjective (qualitative) and objective (quantitative) — between direct and indirect selection.

Testing how users select

We built a prototype called Box Popper. The design was simple: display a grid of squares on the tabletop. Choose a random square from the grid and extrude it up into a cube. The user selects the box, using either direct or indirect mechanisms. Once the user successfully selects the box, scale it back down and scale up another random box.

In direct selection, the user simply tapped on the box to select it. In indirect selection mode, we introduced an aiming cursor in the middle of the screen. The user aims the cursor at the box, then taps anywhere on the screen to select.

*Direct selection (left) and indirect selection (right)*

Even with this simple test, we had a lot of questions, such as:

Would people perform better (miss less) with direct or indirect?
Would accuracy differ (selecting close to the center of the box) between the two?
Would it matter how apart two subsequent boxes are (Fitt’s Law in AR)?
…how big the boxes are?
…how far the boxes are from the phone?
How small could we make the box before people started missing it more than 10% of the time?
How quickly could people select one box after the other, using the two modes?
Which aspect (size of box, distance between subsequent boxes, etc.) impacted accuracy the most?
Would people figure out that they could physically move the phone closer to the box, in order to make the task easier?
Would they perform differently under pressure?

In the end, we had over 50 questions derive from one simple task!

Exploring our assumptions

From the start, we had certain ideas as to how the tests would play out and our initial instincts were:

Direct mode would feel more like a true “AR” experience, and it would be faster overall.
Indirect mode would be more precise, yielding fewer errors (missed attempts).
People would have an easier time figuring out and using direct mode, since it’s how they use their phone now: to use something, just tap on it.

We were also very curious to see whether novice AR users would figure out that they can physically move the phone around to make the task easier. ARCore offers a “6 degree of freedom”, or 6DOF, control scheme. Not only can you rotate the phone in three dimensions (left/right, forward/backwards, and clockwise/counterclockwise), but you can move the phone in three additional directions as well (lean left/right, slide forward/backwards, and move up/down).

In all the apps we’re used to, we have only 3 degrees of freedom — we can tilt and rotate the phone, but without AR, there are no sensors capable of detecting phone movement. It’s also worth noting: GPS and other sensors give you an approximate location within several meters, but AR requires much finer-grained tracking — on the order of millimeters. We have 10 years of experience telling us that phone rotation works (such as in a compass app), but phone movement does not.

We wanted to see whether people would intuitively figure out this powerful new feature, and wanted to get metrics on their accuracy, so we made two more additions to the prototype:

1) Every fifth box you popped would cause the grid squares to shrink by 10%…and therefore cause the extruded boxes to get smaller. Our hope was that eventually, as the boxes got very small, users would figure out that they could ‘zoom in’ by physically moving forward, until the small box appeared large again on screen, and then select it. Here’s what it looked like after the 55th box:

2) We gave people a ‘score’ based on the number of boxes they popped. Since they could play the prototype until they missed 10 attempts, we were afraid people would just play very cautiously, and we wouldn’t get any meaningful accuracy measurements, so we added a timer. If you didn’t select a box within 5 seconds, it counted as a miss. Nothing like a little time pressure to force errors!

Here it is in action (note: both videos are indirect selection — user must tap on screen to select the box, which is not shown):

Our surprising results

So what did we learn? People are having to relearn basic tasks and controls within an AR experience. Since the task was basically a race to pop the boxes, we saw many users during direct selection switch into what we now call “Eagle Eye View”. That’s where you stand up, look directly down on the table, and hold the phone high enough that the entire grid is onscreen. Wait for the box to appear, then simply tap it with your finger. Users reported that, despite our assumption to the contrary, direct selection mode felt less like an AR experience. Recall, our instinct was that ‘reaching through the screen’ would feel very much like an augmented experience. However, when faced with this particular task, they reduced the problem to 2D, and removed AR from the experience. The key takeaway is that users will find the most efficient way to do the task you give them — even if it’s not how you expect them to do it.

Continuing on that theme: users will surprise and delight you, discovering completely new strategies on their own! Here, a user figured out a “double-thumb” technique for the direct selection task:

*Why use one thumb when two work twice as well?*

Most people did not figure out 6DOF controls on their own. We hoped they’d learn to lean and move around the table, but they didn’t. We saw this through our own observations as well as in our data.

Sometimes the box would appear (or move) offscreen. This is one of the major landmines for AR developers — remember, the player might not be looking where you want, when you want! Lots of great work has been done in the VR field to account for user gaze, but in our prototype it led to frustration, so we added an arrow to point towards the box whenever it was offscreen. You can see it in the animation above, or in this screenshot:

Lastly, we found that in indirect mode, the design of the cursor made a huge impact on the user experience. Our cursor changed color (to green) when you were ‘on target’, and whenever you moved on or off the target box, the phone played a low ‘click’ sound and a brief vibration. The combination of audio, visual, and even haptic feedback gave it a solid, comfortable feel.

These results surprised us, even as we experienced it firsthand — which just goes to show, AR interface is completely new, and user tests are a great way to find the best solution. We’re continuing to explore interface design problems, and hope to share more with you in the future! In the meantime, learn more about ARCore.

What do you think?

Do you have any comments on AR or findings and solutions from experiments you’ve run? Continue the discussion in the comments below or tweet using the hashtag #AskPlayDev and we’ll reply from @GooglePlayDev, where we regularly share news and tips on how to be successful on Google Play.