Enterprise VR: Ergonomics

Horst Werner
6 min readApr 20, 2022

--

In the last post, we looked at the desirability and viability of enterprise VR. Much of the latter comes down to the ergonomics of this medium.

Working with a computer (or even with an abacus or a sketch pad) involves a cycle in which the mental model, inside the human’s head, and an external, embodied model, are constantly synchronized with each other.

The operator manipulates the medium and then interprets the changed embodied model, translating it back into a mental model. In general terms, the efficiency of the manipulation is constrained by the physical ergonomics of the medium, whereas the efficiency of the interpretation is constrained by the cognitive ergonomics of the medium.

The efficiency of the whole interaction cycle, thus, depends on both physical and cognitive ergonomics and, sadly all too often, on the response delay of the medium, i.e. the time the user has to watch a wait spinner.

Compared to the interaction with a conventional setup, we will find that the physical ergonomics of VR are usually worse, whereas the cognitive ergonomics can be much better than in 2D.

Physical Ergonomics

Instead of keyboard and mouse, a VR setup typically uses one controller per hand. The position and orientation of these controllers in space are tracked, and in addition they have one or more buttons and a thumb operated joystick. Some controllers (e.g. Valve Index) can track the position of individual fingers.

The usual way to emulate a mouse is to emit a virtual laser beam from one of the user’s hands. The intersection of this beam with any spatial object corresponds to a mouse pointer. This allows users to interact with objects at distances beyond the reach of their arm, using a controller button to “click” or drag. The primary thumb joystick is usually used for scrolling.

Some user interfaces also let users directly touch virtual buttons or objects within the reach of their arms, without using a controller button. This can be faster but also requires more physical effort than just a slight change of angle of the wrist. In general, gesture-based interactions involving the whole arm(s) — as shown in the movie Minority Report — may look intuitive and efficient, but put too much strain on the user for sustained work.

The video below shows what working with Splunk VR, a state-of-the-art but recently discontinued VR application, looks like.

Even though the UI uses the laser pointer method, most of the interaction requires extending the arm(s) away from the body. In terms of precision, the laser pointer works reasonably well, but manipulating smaller objects/buttons at a distance is tricky due to the need to hold the hand steady.

We also observe that, in this virtual space, individual dashboard panels have approximately the size that a whole display screen should have. So much of all that newfound screen real estate appears to be squandered on oversized content. This over-sizing has two reasons: first, larger objects are easier to target with the laser pointer (which is, after all, less precise than a mouse), and second, the available screen resolution for these virtual displays is low.

The Oculus Quest, for example, has an effective resolution of 1440x1600 pixels, which is about 11% more than full HD. However, that resolution has to cover the full field of view. If we stand in front of a 25" monitor, at an ergonomic distance of 2 feet, the screen will cover roughly 1/7th of our field of view. If we represent that same display in our virtual environment, it will have an effective resolution of 765 x 430 pixels, which is just 16% of the full HD resolution. With a high-end headset, such as the HTC Vive 2 Pro, this percentage improves to 43%.

Another important aspect of physical ergonomics concerns text input. Virtual hovering keyboards operated with the laser pointer or joystick are a rather inefficient way of entering text, but may be sufficient if only needed occasionally. When hand tracking and direct touch are used, the lack of haptic feedback leads to unnecessarily large, and un-ergonomic, movements.

Luckily, newer VR headsets such as the Quest 2 and Vive Focus 3 allow users to see the keyboard and their hands through the headset’s built-in cameras — a capability called “passthrough”.

Another alternative for settings in which the user is not sitting at a desk (and not close to co-workers) may be voice input, especially for commands. We might also see virtual keyboards operated by eye-tracking soon.

Cognitive Ergonomics

Our brains have evolved over millions of years to make sense of spatial environments and tangible objects. They became very good at remembering where to find the good food and telling the good food apart from the ones that make us sick. We can also, within fractions of seconds, interpret that odd movement of the high grass a hundred feet away as a predator sneaking up.

Much more recently, we have mastered the art of reading — interpreting complex geometric patterns as a persistent representation of speech, which itself represents concrete or abstract mental concepts. While that is an absolutely amazing achievement, it remains much less efficient than the interpretation of the spatial environment that our subconscious mind performs continuously.

Reading a page of text, which contains about 2 kilobytes of information, usually takes us 1 to 2 minutes. However, when we drive a car or catch a ball, our brain processes megabytes of information every second without the slightest effort. A particular strength of our subconscious information processing is that it lets our conscious mind work independently until something requires its attention — such as an unexpected pattern or movement.

Using a spatial environment and 3D representations of (real or abstract) entities lets us exploit the subconscious information processing; in particular we can use the following effects that are unavailable to conventional UIs:

Location as Semantic Reference Frame

Our capability to associate information with locations is so strong that a powerful memorization technique, “Memory Palaces” is based on it. In a spatial work environment, this association works in two directions:

  • We can find information much faster because we know where to look for it in a literal sense.
  • Any phenomena we observe in a particular area of that space (be it a crowd of avatars or a burst of red lights) can be immediately mapped to meaning. Spatial proximity reflects semantic belonging.

Shape Expresses Meaning

While conventional UIs strive to exploit our ability to make sense of shape through icons and charts, they fall short of conveying multidimensional, complex information. Just stand on a highway bridge and observe traffic for a minute. You’ll be able to tell apart not only different types of vehicles (cars, trucks, motorcycles) but also size, make and model, approximate age, state of repair, logos, roof racks etc.

If we represent entities by highly differentiated geometric shapes (and color schemes), we can make it just as easy to track complex system landscapes, business processes etc. The system and message representation in the Splunk VR video above is a mere hint to what is possible.

Near vs Far

The unique advantage of a “first person” 3D representation is that the things that are close to the user take more screen space while things that are farther away are still visible, but scaled down. That allows the user to focus on one thing while at the same time — subconsciously — watching many other things in their peripheral field of view.

Since in a well-designed environment, semantically related things are located close to each other, the view from the location of a specific topic will show related content larger, and thus with more detail, than less related (and hence more distant) content.

The essentials of distant content can still be conveyed by means of tall structures, or beacons, forming a spatial information hierarchy (more on this in a future post).

Angles of View

A natural property of spatial objects is that they appear different when looked at from different angles of view. We can exploit that by visualizing different subsets of information for an entity on different sides.

Let’s assume we are representing a complex system of supply chains as a cityscape, where the suppliers are always located west of their consumers. When we look at this cityscape from straight above, the top sides of the stations could give us information about the general flow. The eastward side of each station (which is visible to the consumers) would represent capacity and the westward side (which is visible to the suppliers) would represent demand. Depending on the angle of view, the user always sees the contextually relevant information.

Although this post just scratched the surface with respect to the two kinds of ergonomics, it illustrates both the current challenges and unique strengths of a VR work place. The next post will address the question: in which markets, and use cases, can these strengths make a difference.

--

--