Audio Engineering for VR

Team @ Immersive Cinema Berkeley
Immersive Cinema
Published in
3 min readJun 18, 2018

Sound is a crucial component of creating an immersive environment. Human perception of sounds is a rich, interesting domain on its own. Your ears are physically located apart and pick up inter-aural time differences; your skin resonates with the percussion of sound waves; your brain works tirelessly to pick up all these signals to not only localize sounds but also create a dynamic experience for you. Sound complements the visuals to create a more realistic and immersive VR environment, and it can even be a primary story-telling means by itself.

Spatial audio refers to placing sounds as it should be in a 3D space. Here’s an interesting demo of spatial audio from the Binci project (listen with headsets). There are several types of spatial audio: Stereo refers to a two-dimensional audio (with left and right channels), while surround sound refers to multi-dimensional audio that surrounds the audience. Surround sound can be generated from mono or stereo sound files. Binaural audio is similar to surround sound in that it produces a 360 degree sound effect, but it’s specifically experienced with a headphone. It’s also fixed recording, in contrast with ambisonic audio which is a 360 degree audio that’s interactive and can change according to user’s head movement.

There are a variety of ways you can record or create 3D sounds. One method is binaural recording, which is to place two microphones, apart from each other to imitate human ears, and record at the same time. An example of this category is dummy head recording, for which people create a real dummy head object and put recording devices in ears and acoustic materials in ears and the mouth. Another mechanism is head-related transfer function (HRTF). Given an input signal, it uses digital sound processing to produce a response that simulates a sound from a certain point in space. There are other recording devices that can record ambisonic audio directly too, like an ambisonic microphone. You can use these audio and import directly into your video production.

Other times you might need to convert, or encode, mono or stereo audio to ambisonic audio. There are a few softwares handy for that. Adobe Premiere Pro is a common tool for editing VR videos, and you can add on ambisonic audio with it as well. In short, you are able to import mono audio as adaptive track media, position them at different orientations, monitor the sound field till you are satisfied, and align the audio with video. Adobe has this helpful tutorial with detailed instructions. Ambisonic Toolkit offers a range of plugins that are useful for encoding mono or stereo audio into ambisonic audio. It can be used with Reaper or SuperCollider, both of which are sound synthesis and editing platforms. Similarly, Facebook 360 Spatial Workstation is another platform just for spatial audio design and creation (and it’s free!). One thing to note is that, after getting the encoded ambisonic audio from these sound editing platforms, you will need to orient it with your video (referred to as muxing). In addition, it’s important to think about the platform you’d like to publish your video on, and make sure that the spatial audio format you generate is compatible with the platform. For example, Youtube supports first-order Ambisonics (no head-locked stereo).

At last, we’d just like to mention that spatial audio is still relatively new and is still evolving. There are some interesting research or projects in this domain. For example, Google Omnitone is an interesting project that looks into rendering spatial audio on the web using Web Audio API. Google also has a newer project called Resonance Audio which helps create spatial audio for VR, AR, and just video experiences.

Reference:

By: Weiwei Zhang

--

--