Working with Spatial Audio for Distance Education

June Kuhn
7 min readApr 28, 2020

--

Large black microphone with a loudspeaker in the background

The future is bright for educational technology. I am filled with excitement to share with you the opportunities that virtual reality has to offer for in-person classrooms and distance-learning environments. Especially at a time when no one can physical attend meetings, conferences, and trainings, virtual reality with spatial audio offers virtual experiences that break the boredom of watching online lectures and PowerPoint slides. Using actors, cameras, and microphones, small groups or individuals can create virtual reality productions that can be distributed to thousands or millions of people at a time.

This specific project explored virtual training for educators. Educators need the social abilities and the emotional intelligence to navigate highly stressful situations. An immersive video lends a simulated stress for the viewer, making them feel that the scenario given to them is real. Spatial audio, in particular, was essential in creating the best auditory simulation of the scene. If you’re sitting at a meeting, you’d expect to hear an actor’s voice in the direction of their mouth, right?

For many in the VR industry, ambisonic audio may be a secondary consideration. And because of that, finding the right workflow can be challenging. For this article on spatial audio workflow, I will assume that you have an understanding on the difference between stereo and mono VR video, what ambisonic audio is, and beginner audio production knowledge. If you are unfamiliar with any of these topics, I suggest that you look into those before reading further.

Project Overview

This video is made for Individualized Education Program (IEP) training, for a course at NC State College of Education that helps educators meet the needs of exceptional children. IEP meetings happen between the parents, teachers, and school administrators to assess the progress of students who have learning disabilities. These meetings are stressful and full of politics and stressors that require the educator to know what to say and when.

Dr. Jamie Pearson, the instructor, teaches the course Teaching Exceptional Students in the Mainstream Classroom, the target deployment for our finished video. The goal of our virtual reality production is to give students a) the ability to facilitate an IEP meeting, b) situational awareness in order to respond appropriate to members of the meeting, c) and be able to write measurable goals for a middle-school aged child.

The current immersive training solution is costly, has considerable latency, and requires group scheduling. Our stereoscopic, spatial audio video affords the instructor (through our grant) a free media resource, with no latency using Oculus Go’s. It’s an individual experience that students of the course will be able to rewatch at any time.

This exploratory project was also a pilot for future immersive videos because much the technology that we exercised had only existed in silos; this is the first time we were combining stereoscopy, 3rd-order ambisonics, custom Unity rendering, and experimental video formats all in one project.

Tech Overview

For this video we used some fantastic hardware and software tools that made what was once laborious and time-consuming into a fairly streamlined process. The overall workflow might have looked something like this:

Basic Overview of our workflow

A lot of us had to collaborate directly in order for files, formats, and other technical aspects to line up correctly. When you’re working with stereoscopic video and 16 channel audio, you need to make sure everybody knows how to work with the media file you give them. The first pass was run some unedited video files through our proposed workflow. My audio work was towards the end, because in my digital audio workstation of choice, Reaper, I needed corrected video to be able to edit the audio.

For audio, the hardware was less complicated than you’d imagine. We miked each actor with a lavalier, and recorded each mono channel separately. There was a Zoom H3VR on-site, but the recording it produced had inference with another device we had. I would have loved to have used an ambisonic microphone like the Zoom H3VR for this project for various spatial calibrations, but the audio we got from it was very noisy. Every take had this weird level of static that we assumed to be related to the wireless devices in the room.

I used Reaper for the audio production platform. With its flexibility, ease-of-use, video viewer, and built-in plugins, I had a really easy time setting everything up. To prep for this project, I made an ambisonic video of some cats eating salami to practice automating parameters and setting appropriate levels.

Where ambisonic plugins mostly fall short, Audio Ease provided exactly what I needed: an easy way to align spatial sound with spatial video. Using a video I was able to map the position of a sound to exactly where I wanted it to be in the video.

A digital audio workstation screenshot
The individual recorded tracks are mixed to a AmbiX Format Master Track
screenshot of a video preview of a virtual reality video
With Audio Ease you and click and drag the tags to spatialize audio

The last piece in the ambisonics puzzle is being able to preview the 16-channel soundfield on headphones, which only accept 2 channels. Using the Audio Ease decoding plugin, I could listen back to anything I’ve created, and be able to hear the spatialized sound pretty well. Not the best solution for hearing the entire soundfield at once, but it fit the purposes of this project.

The last piece of software that I used for FB360 Encoder which converted my 3rd order, 16-channel files to 2nd order, 8-channel Facebook files and muxed it with the stereoscopic video all in one render. It’s a powerful standalone, free tool that hasn’t failed me yet. I personally don’t like using the FB360 audio plugins for anything except for the conversion feature. The end result? Stereoscopic video with 8-channel spatial audio in a .mkv container. Amazing.

I used the FB360 Encoder to mux the final video, instead of Media Encoder.

Challenges and Considerations

The biggest challenge I had with this project with deciding whether traditional audio tools met the job of spatial tools. It was a lot of guesswork, since I don’t have a PhD level of understand of this stuff. A lot experts seemed to say that regular compression and reverb mess with the spatial resolution of any multichannel audio.

Since completing the project, however, I’ve installed the IEM Plugin Suite, and that just happens to solve all of that. Compression, limiting, denoising, and so many other variables are now, well, not variables anymore. I have better confidence to say I’m doing the correct procedure. Below is a screen capture of one of their beautiful (and free) plugins.

Another thing I want to do differently in the future is to accurately recreate the reverb of a space using convolution. Convolution is a mathematical operator to “multiply” one signal by another to to combine their qualities. Next time, I will record an impulse response (a fancy term for hitting a clapper) of the room that we use and use convolution to recreate the reverb of a space. Using the Audio Ease plugins, I only had to guess what the reverb was like, and I don’t have the ears just yet to do this effectively. With ambisonic reverb, you get additional spatialization that would not be afforded with traditional reverb. If you’re curious about this process, Audio Ease did make a nice video for how they made their reverb impulses.

Audio Ease makes great demos

Unfortunately, none of the takes from the Zoom H3VR turned out, so I’ll have to wait until next time to try out my own ambisonic convolution reverb in a professional production setting.

Moving Forward

The project was made possible through the DELTA grant program at NC State University. For more information, read here.

I hope this article helps you understand how the spatial audio workflow might go, and potentially motivates you to try it on your own! YouTube fully supports First-Order Ambisonics and Head-Locked Stereo, so you can share your creation very easily on the internet!

If you have any questions or inquiries, feel free to send me an email at jtkuhn3@ncsu.edu. Or if you’re interested in seeing more examples of 360 video with spatial audio, check out my work or personal YouTube channels. Or follow me on twitter @justinkuhnmedia

--

--

June Kuhn

Creative Developer based in London, England. Builder of digital instruments and audiovisual experiences.