Head-Tracked Transformations

How do you look behind an object in VR if you can’t walk around it?

Head movement controls the transformation of the cubes. This transformation moves inversely proportional to the head movement.

Not all virtual reality headsets are made the same. Google Cardboard (the HMD I’m working with), uses your phone to track head rotations, but it is unable to track position changes. That’s pretty limiting for designers. In this prototype, I design and explore an interaction where head rotations control objects in a virtual scene.

In the next sections I’ll break down the problems as presented, which problems I ran into and how I overcame them, the technical bits of code, a splash of user testing, and a working prototype.

Problem, Hunt Statement

It’s important to start my experiments with a goal — otherwise I’m likely to lose focus. For this sprint, the goal is to find an answer to “How should we see around objects if we can’t move our bodies?”. I form this question into a hunt statement, or guiding statement:

I am going to research a VR interaction in order to see if it can overcome a lack of position controls on devices like Google Cardboard.
That’s a Google Cardboard on my face. Photo credit: Rachel Ng.

Solution, Prototype

Using a user’s head rotation data as a controller, my prototype allows for manipulation of a targeted object. User input is scaled so as to reduce neck fatigue. And, the controlling function inverts user input so as to mimic a user physically looking around an object.

Experiment 6, Head-Tracked Transformations:

Source Code:
github.com/armthethinker/webVR-experiments/blob/master/6 — head-tracked-transformations.html

Problem Space, Existing Work

Virtual reality designers and developers have already come up with a few ways to move either yourself or objects in VR. I outline a few of those here, with their pros and cons.

Just Move Around

Photo source: Giphy
  • Most natural interaction
  • Not available on Cardboard
  • Requires room to move

Hand Held Controls

Photo source: Giphy
  • Great for targeting or selecting
  • A technologically simpler way to get accurate hand tracking into VR than infrared
  • Not available for Cardboard (though Google is working on a project, Daydream, that will have a hand held control)

Game Pad Controller

An Xbox controller ships with the Oculus Rift. Photo source: Oculus.com.
  • Easily accessible because of existing distribution networks
  • Because it was build for a different device, it doesn’t have tracking and can be immersion-breaking
  • Reports of motion sickness when motion is tied to the analog sticks

Hand-Tracked Controls

Leap Motion demo. Photo source: Giphy
  • Great for bringing your hands into VR
  • Can be buggy on desktops, really buggy or non-existent on mobile

Gaze-Based Translation of Your Body

From an upcoming case study on Humane Virtuality.
  • Looking around can trigger movement
  • Useful for when you don’t have controllers or can’t walk around
  • Users experience various levels of nausea, from no nausea to extreme nausea, based on the transition type and the user

In summary, Cardboard gives a limited range of motion, prohibiting users from looking around objects by changing their position. There exist various forms of position controls, but not all of them work for Cardboard.

Design Process

When I used an Oculus DK2 for the first time, the most powerful experience for me was moving my head forward in space to see into a small stack of cards on a desk. It sounds cheesy, but that first head movement you make in VR can bump immersion up to 11.

Oculus DK2 demo scene.

We can’t do that with Cardboard.

What can we do then for position-like movement? User rotation data is available, taps are available, and user gaze is available. I found the idea of head rotations interesting, so I wanted to tie head rotation to the transformation of an object in a scene.

But first, let’s define some terms:

  • Transformation: change in rotation, translation, and/or scale.
  • Rotation: each point in an object is moved where by all points stay equidistant from the point of rotation (colloquially, turn something)
  • Translation: each point in an object changes position by the same amount in the same direction (colloquially, move something without rotating).
  • Scale: grow or shrink an object equally in all directions (colloquially, enlarge or shrink something)
Transformations may include rotation, translation, and/or scale.

We should also define how we’re using terms about head rotations. I’m going to use the airplane conventions in this essay.

  • Pitch: nodding up and down
  • Yaw: turning left and right
  • Roll: tilting left and right
Photo source: Enhanced real-time head pose estimation system for mobile device

Which Way & How Much?

There exist an infinite number of ways to tie head rotation to changes in the world. As good designers, we need to find the way that makes the most sense for what we’re doing.

Our goal here is to let the user see all sides of an object easily. It’s the simplest way to test our idea.

We could restrict the user to seeing most, but not all of an object. This might be useful if the front of it were most interesting, the sides a little interesting, and the back not interesting at all. But, we’re not going to do that here because it is overly complex for the prototype interaction we’re going for.

“Easily” is ambiguous. To specify, it’s important for the user to not get fatigued by their movements while they are trying to see what they want. Conversely, the object shouldn’t transform so quickly that the user has a hard time controlling it. Think about this like your cursor speed. If your cursor moves too quickly across the screen, you have a hard time clicking what you want. If it moves too slowly, you struggle getting it to where you want.

If we’re tying head rotation directly to object rotation, we could tie it directly:

headRotation(pitch, yaw, roll) = objectRotation(pitch, yaw, roll)

Or inversely:

headRotation(pitch, yaw, roll) = -objectRotation(pitch, yaw, roll)
Let’s use this as a reference. The blue boxes, as a group, are our targeted object.

Direct means that when you look right, the right blue box comes towards you (use the above image as a reference). When you look down, you see the bottom of the blocks. It is as if your head is a direct controller of the boxes.

An inverse connection means that when you look right, the left box comes toward you. Looking down, you see the top of the blocks. It is as if you are walking around the object, looking at it as you move.

With a simple connection (direct or inverse), the user has to turn all the way around to see the back of the object. We can help them out by creating a scaling function that multiplies the user input. For example:

// The same multiplier on each variable
headRotation(pitch, yaw, roll) = 3 * objectRotation(pitch, yaw, roll)
// Different multipliers on each variable
headRotation(pitch, yaw, roll) = objectRotation(3 * pitch, 4 * yaw, roll)
// You could also imagine a function applied to each variable to change them in non-linear ways (e.g. the object doesn't rotate much when you turn a little, but when you turn a bit further, it rotates a lot - thus allowing for fine-grain and coarse-grain control)

In testing, users didn’t notice the difference between head rotations (direct or inverse), but they did appreciate when there was a scalar applied to their actions, easing their neck strain.

We also must choose if the transforming object is anchored to the world or anchored to the user’s perspective. This depends on what is best for the specific experience … which is to say: in an experiment about an interaction out of context, we can’t suggest which anchoring would be best.

Example of the targeted objects being tied to the world, not to the user’s camera. The connecting function is also inverse and scaled.

Giving Context

It’s easy to perceive an object’s rotational changes in VR because its image on your retina changes in a unique way. But, it’s harder to perceive scale and distance changes. A-Frame does not include the same visual feedback that our reality has. For both an object that is scaling down and an object that is moving further away, the image formed on our retina is highly similar.

In my last case study I talked about binocular disparity, the phenomenon where we can perceive distance by the difference between left and right eye images. While binocular disparity works well for objects near to us, it doesn’t work well for objects further away, where the left and right eye see more of the same thing.

We rely on multiple visual features to understand depth. I used a few strategies to give the user a perception of depth including: a texture gradient, relative sizes, linear perspective, aerial perspective, and occlusion.

Left: the base environment with a patterned floor. Center: add fog. Right: add pillars.

Using a patterned floor helps the user perceive depth by the texture gradient (i.e. the texture looks different at different distances). Fog gives the effect of aerial perspective, meaning that the atmosphere changes the perception of objects. When looking at mountains in the distance, they might seem more blue and faded because of the light scatter. The pillars are all the same size which helps the user understand the depth of the environment by relative sizes and linear perspective. The pillars also show depth by aerial perspective when they are far enough away to fade into the fog.

In a setup where head tilt controls the scale of the boxes, the left image shows occlusion by a pillar and the right image shows increasing occlusion by the fog (which is the same as aerial perspective effects). In the left side, you know that the boxes small a at a medium distance away. Occlusion gives you some of the distance information. The lack of aerial perspective tells you it isn’t far away and your binocular disparity tells you that it isn’t near either. Thus, if the image on your retina is that small at a medium distance, the boxes themselves must be small.

Technical Bits

If you’re not interested in the code, go ahead and skip this part. Scroll down to User Testing.

Each of these code snippets showcase a problem I encountered and solved with code. All of this is to the best of my ability crossed with available time. As you follow me through my case studies, you’ll likely reencounter some of these functions. I’m building tools to help me and others prototype quickly and effectively in VR.

Observing The Camera’s Changes

This experiment was my first for the internship and as such I had little understanding of what I was doing. Looking for ways to track all changes to a DOM element, I ran across MutationObservers on the Mozilla Developer Network (a great resource for learning how to use Javascript). MutationObservers watch for changes in the DOM and report them. You can tie a function to it so that you can programmatically update content when a DOM element changes. I use it to watch the <a-camera> element. When a user moves their head, the camera’s HTML attributes change. The MutationObserver picks this up and moves the boxes according to the active transformation.

Later I found an event listener in the A-Frame documentation called componentchanged. Maybe it would have worked better, but I don’t have time to worry about that now that the prototype is functioning.

Toggling the Target & Hiding Boxes

For this setup I use two sets of three boxes, which I call targets because they are targeted by the MutationObserver. One set is just in the world (the second set below) and the other is tied to the camera. During my development and messing around, I can press T to toggle between the world boxes and the camera boxes. In the background, both exist, but only one is opaque (while the other is fully transparent).

These are the two functions that I use to toggle between targets and change the opacity of the boxes:

Proxy Controls

See the “Proxy Controls” section in user testing.

User Testing

While testing these prototypes isn’t my primary focus, I think it would be a drastic oversight to not include a few user tests. This experiment had six users, all women, of ages spanning nine years old to forty four years old.


In my first case study, I outlined my setup and workflow. Here, I’ll include the diagram I had before.

My MacBook Pro holds the files and captures the session. My iPhone 5 displays the experience to my user and sends its screen back to my laptop to be recorded.

Findings & Methods

When working with users, I implement think alouds: the user describes, out loud, everything they are thinking, hearing, seeing, and experiencing. Think alouds help me understand what the user is experiencing and where their expectations may not be met (e.g. “I thought that if I looked at this, a thing would happen, but I guess not.”). When a session is done, I may ask a couple of follow up questions regarding their experience, expectations, and perceptions of virtual reality. I like to run these sessions in groups of two users who know each other, rather than a one-on-one session. Since these are more explorations than usability tests, the ping-ponging conversation between the three of us is more useful than one person reporting to me (as found in my one-on-one and group testing sessions).

A selection from one of my user tests. This user is in a setup where when she looks up, the blocks scale up. Their horizontal and depth positions are locked, while their vertical position scales with the blocks. You can also watch how UX researchers can remind their users to keep thinking aloud.

For this experiment, I found the following:

Users understand what is happening and how the interaction may be useful in-context. One user stated, talking about the boxes, “I can look them all over”.

The patterned floor was confusing. One user thought it was a bunch of wi-fi symbols and others found it dizzying.

This could be used in the Product Presentation prototype. One of my other experiments looks at product presentation in VR. Two users stated that it would be useful to turn a product they are looking at by using their head.

The ring is a Cheerio. Behind the set of boxes I placed a small, yellow, doughnut-shaped ring to (1) help the users identify which side of the boxes they are looking at and (2) show off how this interaction could be used to find hidden information. After they found it, almost every user called it a Cheerio.

Questions about eye strain. Kevin Ngo asked me if users felt eye strain after extended use. I didn’t have an answer when he asked because I hadn’t run user tests yet. Now, I can say that there at least isn’t eye strain from short term use, but I don’t know about long term use.

Users appreciate the scaling functions. Again, scaling user input allows users to move less, but still see all sides of the object. This creates less neck fatigue for the same outcome.

Users don’t always notice the difference of inversed setups. My original inclination was that users would find inversed setups more intuitive (i.e. the setup where the target moves inversely to head rotation in order to mimic physical movement around the object). But, limited user testing shows no experiential difference.

Proxy Controls

When I set out to finish this project, I worked with with Don McCurdy’s aframe-proxy-controls for A-Frame. It lets you send keystrokes from a laptop to a phone-in-Cardboard, which, if tied properly, can control things in the environment. Since I usually make variants of my prototypes, it is useful to change settings while the headset is still on my test subject’s head (the other option is for me to make a change, then reload the experience, and hand the headset back to my user).

This ended up not working out for me — I just couldn’t get it to work with my level of programming expertise. It was disappointing. I had to (still have to) remind myself that the programming I’m doing is to show off my design ideas, not the other way around. As such, I can’t spend too much time figuring out how to program. Instead, I need to work on figuring out more effective ways to prototype my ideas.


I am going to research a VR interaction in order to see if it can overcome a lack of position controls on devices like Google Cardboard.

Revisiting our hunt statement allows us to judge our work. Are we on task and have we created an effective solution? After reviewing existing methods to look around objects in VR, I explored an interaction whereby user head rotations control an object’s transformations in the environment. Users can control these targets in direct, inverse, and scaled ways.

User testing shows that users can easily understand this interaction (1) with a preference to scaling their input and (2) with minimal eye strain during short periods of use. However, there does not seem to be a preference for direct control or inverted control.

This interaction was designed to cleverly circumvent a limitation of Cardboard, but I’d love to see it in an Oculus or Vive. Minimally, you could create interesting experiences and maybe you could create something useful.

Experiment 6, Head-Tracked Transformations:

Source Code:

For more user experience design for virtual reality information, follow me and the Humane Virtuality collection.