Protecting Users from Overload in Augmented Reality

Why and How to Spare Users’ Working Memory in AR UIs

Clay Killingsworth
Psychology in Action
8 min readMar 5, 2021

--

Co-Author: Cat Hodges

Most researchers in psychology and the social sciences accept that memory comes in two primary forms: long-term memory and working memory.

Long-term memory refers to information that has been stored and can be recalled at some point in the future. Your phone number, the address of your childhood home, and the procedure for saving a Word document as a PDF are all stored in long-term memory (LTM). LTM is often compared to a computer hard drive, and a big one at that — as far as we can tell, there isn’t a limit on the amount of information that can be stored there.

Working memory (WM), on the other hand, is painfully finite.

Previously referred to as short-term memory in a number of texts, WM is a system of several components that hold and manipulate information needed to complete tasks or action that are in-progress (Baddeley, 2000). Two of components of WM worth focusing on include the phonological loop and the visuospatial sketchpad.

The phonological loop holds auditory information, while the visuospatial sketchpad retains visual and spatial information. When we hear verbal instructions, we leverage the former, the phonological loop. When we read a map, we’re relying on the visuospatial sketchpad.

Both components of WM have their limits. However, because the visuospatial sketchpad and phonological loop are mediated by (semi)distinct neural mechanisms, there may be opportunities to avoid overloading one by using the other instead (Wickens, 2008).

This is no free lunch, though. While information of one kind can be translated to the other — for example, you can convey navigation instructions verbally and someone else can hear those instructions and use them to arrive at their destination — every such translation requires mental work and can very quickly become very effortful.

Load and Overload

For a given task, the information needed may come from some combination of the environment (for example, your UI) and LTM. When your user interacts with your system’s UI, it places some degree of demand on their WM resources. This is to be expected — a conscious, alert user’s WM is always in use to varying degrees.

Where this becomes a problem is when the UI overloads a user’s WM capacity. So, how much is too much? It’s hard to say — research on the precise capacity of WM is. Further, trying to predict how exactly how much an interface will load WM is, at best, labor intensive and of dubious value.

Less is almost always more, and following principles like the ones we discuss later provide the most bang for your design buck. What’s much easier than predicting the chances of overload is observing its consequences.

Outstripping a user’s WM capacity leads to lots of poor outcomes. First and foremost is forgetting — as the WM cup fills up, it eventually starts to overflow, and important information can be lost.

For example, when overloaded, it may mean users go back to find whatever information was lost (searching is a lot of mental work). Even high WM demands that are still manageable have consequences. A subpar interface can be navigated at the expense of increased effort, but this increases subjective workload and can only be maintained for a short time before it becomes mentally and emotionally distressing (Hancock & Szalma, 2013; Matthews et al., 2003).

Users may subsequently disengage with the system when possible, or they may misuse or abuse elements of it in an attempt to make the task easier. In short, mismanagement of WM load can produce a system that is at best underutilized and at worst hated.

Design Recommendations

1. Manage Information Modalities

One approach is to look at balancing the types of information your system requires users to work with. When overload is a risk, be sure that no one sense is handling all of the load.

Visual information is often overused thanks to our history with stationary, 2D screens. Making use of other sensory modalities is more feasible than ever with AR, so it often makes sense to avoid text wherever possible.

Further gains can be made by considering the ultimate form in which a piece of information will be used. For example, if you’re providing turn-by-turn navigation assistance, you could present a text instruction to turn right in 300 feet, you could show a map indicating where to turn, or you could use a lighting change to highlight the upcoming intersection.

The text instructions require at least two steps to become usable: Translating from verbal information to the user’s mental representation of the world, and then matching that mental representation to a cue in the environment like a stop sign.

While the map is an improvement, the lighting change requires the least translation of the three. Not only is it in the form to be used (visuospatial) but it is already in the environment, so the user isn’t required to match their mental representation to something external.

While navigation is probably not a daunting task in itself, if combined with other mental operations (e.g., communicating verbally with a teammate) it could play a part in creating an unnecessarily high mental workload.

2. Mimic Existing Mental Models

A mental model is a user’s abstract, mental representation of an interaction, a process, a system, or an expectation around how a technology might work. It is formed through previous interactions with technology or similar processes.

It’s an important concept to understand in system design because in order for any system to be easily adoptable, it must be understandable. More often than not, mental models must match the user’s model as closely as possible. The more a system’s representation matches its user’s representation, the less cognitive effort is required from users to perceive, understand, and react to it.

Though AR enables limitless possibilities, it also upends users’ expectations of how a system should work. This means that a lot of our users’ mental processing power will be used up just experiencing the AR world. We can combat this by closely aligning our AR designs with existing mental models. No need to re-invent the wheel! As much as it makes sense to, we should borrow icons, components, and conventions that people are used to using on normal 2D screens.

While spatial computing will eventually establish its own conventions, it will take time and more use by the wider population to be realized. There are some aspects of UX that will be harder for AR to mimic — particularly naturalistic gestures.

When considering cognitive load in AR, we should strive to mimic naturalistic interactions and gesture much as possible. We have several tips for designing naturalistic interactions (full article coming soon!):

  • Design hand gestures to be simple and easy to learn while balancing how closely the gestures mimic their real-world counterparts. (Side note: ALWAYS consider the physical fatigue a gesture can have on a user. If it’s something they are going to have to do a lot, don’t make it exhausting.)
  • Use affordances or cues to clue the user in to how they can interact with virtual objects. If it can be picked up, give it a handle.
  • Make virtual objects closely imitate the physics of their real-world counterparts.
  • Think about what input or output modality makes the most sense for each interaction for users. For example, if something is time sensitive, play audio cues to communicate that to the user. Likewise, if their response is also time sensitive, allow them to use voice as an input where it makes sense.
  • Some behaviors, like walking, aren’t easily replicated in AR, but you can use perceptual illusions to give the user a sense of naturalism by substituting one cue for another. To simulate walking, you could use the sound of footsteps to create a sense of motion, and better mirror a user’s mental model.

3. Take a Minimalist Design Approach

Many times — less is more and designing in AR is no exception.

Keep things simple and compact. If you have menus popping up and flying everywhere, your users are going to reach cognitive overload fast, and potentially disengage with your system. Try to get away with as few menus and windows as you can. It’s possible there is a creative, more natural solution that can communicate that same information just as effectively.

For example, maybe you want to direct your user to stand in a specific spot. You could have a menu pop up directly in their center of vision and make them read instructions, OR you could animate a pulsing pair of footprints on the area you want them to stand in and couple that with a spatialized audio cue. While the latter option seems flashier, you’re actually requiring fewer mental resources of your user than making your user stop and read instructions.

Sometimes UIs are unavoidable. When this is the case, make sure you are using modern and minimalistic design aesthetics. Use clean, easy to read typefaces, limit your color pallet, and eliminate all non-functional design elements.

4. Progressively Disclose

Why would we want to limit the options of what people can see and do if we don’t have to? With AR, the world is now our canvas, there are no limits to what could be built, designed, or dreamed.

If you’ve read this far, then you understand human beings have a limited amount of processing power. The concept of progressive disclosure has been around since the 80’s, when computers where huge boxes with tiny green screens.

Nonetheless, progressive disclosure principles still hold in this futuristic world of spatial computing because people evolve much slower than computers. Only a limited percentage of the population has interacted with AR in a head-worn display.

That means almost everyone is a novice user when it comes to interacting with AR. Start users off with only the most basic information and features and use those as a jumping off point for discovery of more advanced features. This also leads to “Oh wow, I didn’t know I could do that!” moments that can delight your users.

Conclusion

Augmented reality promises to advance users’ abilities by leaps and bounds, and the technology itself is improving at an exponential rate.

It’s crucial for those who design in this space to remember that 2D screens necessitated some harsh compromises. With the broader technological horizon of today, there are some design restrictions we may not have to live with anymore.

As AR is more widely adopted, the opportunity to design innovative new interaction capabilities will only grow. It’s vital that we take the opportunity to revisit some of the fundamental principles that shaped how we have thought about user experiences in the past and tailor the experiences of the future to the people that will use them.

References

Baddeley, A. (2000). The episodic buffer: A new component of working memory? Trends in Cognitive Sciences, 4(11), 417–423. https://doi.org/10.1016/S1364-6613(00)01538-2

Hancock, P. A., & Szalma, J. L. (2003). Operator Stress and Display Design. Ergonomics in Design, 11(2), 13–18. https://doi.org/10.1177/106480460301100205

Matthews, G., Campbell, S., Falconer, S., Joyner, L., Huggins, J., Gilliland, K., Grier, R., & Warm, J. (2003). Fundamental Dimensions of Subjective State in Performance Settings: Task Engagement, Distress, and Worry. Emotion, 2, 315–340. https://doi.org/10.1037/1528-3542.2.4.315

Wickens, C. D. (2008). Multiple Resources and Mental Workload. Human Factors, 50(3), 449–455. https://doi.org/10.1518/001872008X288394

--

--