How to increase user immersion with (mostly) audio

Andy Elmsley
The Sound of AI
Published in
8 min readApr 25, 2019

A truly authentic, engaging virtual reality (VR) experience hinges on ‘immersion’ and ‘immersion multipliers’. These are highly valued by players and developers alike, but what do they really mean? What exactly is immersion and why is it seen as the holy grail for VR? Let’s explore what makes someone feel immersed, and tap into some quick strategies you can use to radically increase the immersion of your VR experiences — by only thinking about audio.

Handling sound correctly in VR can massively improve immersion (image: Shutterstock)

What is immersion?

Immersion is a state of mind. It’s what happens when you lose the sense of your physical self in an artificial simulation, experience, or fiction, so the simulation appears real and you momentarily forget about the ‘real’ word. The focus of your consciousnesses shifts from the ‘real’ world to the artificial one presented to you. It doesn’t only happen in VR, but it might happen when reading a book or watching a film, or even (for me) when performing a difficult cognitive task, like writing some code. Put another way, in a state of immersion your senses of presence and attention shift into the virtual world, and the more you believe the world you’re in, the more your brain convinces you that what you’re seeing is real.

When it comes down to it, it’s hard to specifically define exactly what causes immersion. If you’re scientifically inclined like me, you’ll look to the existing body of research in the area. You’ll then find that there’s a substantial amount of research out there, but no real consensus on the specific components that lead to an immersive experience. And when even the academics can’t agree, you know you’re on shaky ground.

Staffan Björk and Jussi Holopainen’s four categories of immersion, as defined in their excellent book Patterns In Game Design, seems like an appropriate point of departure. It’s not exhaustive, but it’s still a practical way to think about immersion specifically in games and VR.

Björk and Holopainen’s categories are:

Sensory-motoric immersion

This occurs when you perform actions with your hands or limbs and get feedback through all of your senses — sight, sound and touch — and even smell and taste!

Spatial immersion

When the simulated world is perceptually convincing, you are said to feel spatially immersed, which directly affects your sense of presence. There are subtle nuances to this, including the scale and environmental cues of the world (such as sound).

Cognitive immersion

This occurs when you’re focused on a specific task that requires mental exercise — it’s perhaps the most common form of immersion, as most people experience it when learning something new.

Emotional immersion

This is another common immersive state, often induced when you watch a film or read an engrossing book. It occurs when players become emotionally invested in the experience.

Why is immersion so important for VR?

Many people think that the act of simply having content in VR guarantees a magical boost in ‘immersion’, but this is only true to a certain extent. While it’s definitely easier to achieve a state of immersion in VR, immersion itself is actually a fickle beast that’s difficult to maintain for a longer period of time.

During a VR experience, your senses are bombarded with an overwhelming variety of conflicting information. Usually, players will have most of their senses covered; their eyes enclosed by the headset, their ears in some kind of headphones, and their hands holding one or more controllers. Some companies add more bells and whistles to this setup, but that’s beyond the typical VR rig.

Even with this default sensory-motoric immersion, if something happens in the ‘real’ world, or the simulated experience conflicts with your senses, then the level of immersion can come crashing down — often with nasty side-effects such as disorientation, confusion and motion sickness. Not the virtual roller-coaster ride you signed up for.

As VR developers we can’t fully control the external rigs and environment our players use, so instead we rely on in-experience tricks to playfully coax the user into a deeper sense of immersion. The more immersed a user, the more engaged they’ll feel, and the less likely they are to experience cognitive dissonance between the real and simulated worlds.

Ultimately though, if the experience is deeply immersive, your players will feel happier and find themselves continuously returning to satisfy that need for ‘“just a bit more”. This continued demand is essential to increasing the widespread adoption of VR.

How to increase user immersion with (mostly) audio

As an audiophile, it seems sensible for me to focus on ways to increase the feeling of immersion with only audio techniques. I often feel that audio generally doesn’t receive the attention it deserves, probably because, like the best film soundtracks, if the audio is working well it shouldn’t be noticeable.

Well-implemented audio seamlessly blends into the experience and feels like the real thing. But when this goes wrong it, it’s horrifyingly bizarre — something happens in your brain that most people can’t put their fingers on, they just know it feels ‘wrong’. To avoid this uncomfortable disconnect, a few simple things can be implemented.

Use spatial audio for diegetic sounds…

Diegetic sound is a term stemming from film — it basically means the sounds made by real-world objects, characters, environments etc. When you hear sound in the real world, your brain is doing a lot of complicated stuff to figure out the direction and distance of the sound When this ability is taken away it creates an extremely disorienting effect. Most people won’t be able to pinpoint what’s wrong — the world will just feel odd and fake. This problem boils down to a lack of spatial immersion. However, by using spatial audio techniques you can eliminate this for your players.

Consider film once again. Lack of spatial immersion was also a problem for film-goers, which was the reason for the invention of surround sound. Even with a huge screen in a dark room, hearing audio out of two fixed speakers was a sure-fire way to remove yourself from the experience. The inventors of surround sound placed more speakers around the audience, so that sound actually plays behind you if a character is out of shot. This problem exists in VR, except we can’t just add more speakers to the headset… or can we?!

This is one way to do VR surround sound, but it might be a bit heavy to pull off. (image: Mike Kim)

You don’t need a sci-fi-looking piece of technology from an advanced civilisation to solve this problem. Spatial audio is a tech that’s been around for quite some time in games, and is really taking on a life of its own in VR titles. With spatial audio, sound sources can be placed into the 3D world, and the engine figures out how to manipulate these sounds so they sound like they’re coming from that spot, even through a stereo mix. If you have ambient sound effects in-game, you should definitely spatialise them for their full effect.

…and ambisonic audio for non-diegetic sounds

Spatial audio is great for diegetic sounds that are visible on the screen, such as a babbling brook or a gunshot, but when it comes to non-diegetic sounds — like the game’s soundtrack — we run into another issue. It’s often undesirable to have the music coming out of fixed points in the 3D world; instead, the player should feel enveloped by the music. However, if the music mix is fixed (e.g., the guitar is always in front of you, no matter where you turn), then this can cause feelings of discomfort that lead to a break in sensory-motoric immersion.

To tackle this, we can use ambisonic audio. This is where we have a set of points around the listener in 3D space, that are relative to the listener rather than in absolute positions. The 3D points move with the listener, but changes in the mix can be heard when the player turns around or moves their head. This simulates the experience of being in a real-life environment, almost as if you were at a real-life concert standing in the center of an orchestra. Bravo!

Placing sound sources in ambisonic space (image: Amp Audio)

Ease the learning curve with UI audio

One of the most painful, immersion-destroying experiences in VR for players is learning a complicated control scheme. Most players will struggle with pressing more than one button at once, especially if it’s an unnatural gesture. This is an area where VR actually shoots itself in the foot a little bit. Since players can’t see their hands, it’s even harder to learn the new controls.

The control scheme obviously affects sensory-motoric immersion, but also has a strong effect on cognitive immersion. Somewhat counter-intuitively, hard-to-learn controls don’t lead to cognitive immersion, but actually hinder it. This is because, in order to feel cognitively immersed, you also need to feel in control of your own actions and therefore get less frustrated!

But don’t fret, the answer once again lies in the audio. Having simple audio cues that respond to the user’s actions is a method of positive-reinforcement. It aids cognitive immersion, reducing the frustrations of not knowing if you’re even doing the right thing or pressing the right buttons.

An excellent example of well-executed UI audio in VR is Funomena’s Luna. A very simple control scheme, coupled with a few effective gong sounds providing audio feedback, eases the player into the controls, so they feel more engaged from the outset.

Luna: a fantastic example of UI audio in VR (image: Wired)

Use (deep) adaptive music for your soundtrack

We all know that music is amazingly powerful at conveying emotions, which you’ve no doubt noticed in the real world everywhere you turn. When was the last time you watched an advert on TV with no music, for instance? In VR, though players have their own agency and behaviours — they can choose where to go and how to act. This means that if we want to increase their emotional immersion, we need to dynamically respond to what they do with the music. This is called adaptive music, and is fundamental to interactive media.

The work I do has focused on developing a new way to score interactive media by deepening the adaptive music experience with the help of AI. In our research we’ve found that adaptive music can boost immersion by 30%, most likely due to the increased emotional immersion.

Sound off

So, there are some textbook suggestions for how to improve immersion in VR. Next time you hear the phrase “immersion multiplier”, try to ask yourself which one of the four categories it addresses.

--

--

Andy Elmsley
The Sound of AI

Founder & CTO @melodrivemusic. AI video game music platform. Tech leader, programmer, musician, generative artist and speaker.