The main lesson we learned about recording spatial audio is probably old — but still valuable
At TrackRecord, a new English music site by Univision, we recently released our first 360 video with spatial sound. “Princess Nokia: ABC & 123's” was recorded in July, and it took us months to figure out how to edit and share the video with this new sound feature (we released a stereo version in September).
With spatial audio, users exploring a 360 video, either by swiping the screen or by moving their head in a VR headset, experience the sound based on where they are looking.
From a production standpoint, this means that we need to:
- Capture at least four channels of audio (front, back, left, right).
- Position these audio directions in relation to the video (using the Facebook 360 Spatial Workstation plugin for Pro Tools or Reaper).
- Create a file that will include both the 360 video and the spatial audio, so that “first-order ambisonic” audio can be “baked” into the video. How we “bake” a video will depend on each platform.
This post is not about the technicalities of this process, because even in the last six months the process has become much easier. Also, you will find plenty of specific instructions and updated how-tos out there. The Facebook 360 Spatial Workstation community really helped me with some aspects of this new audio world we filmmakers are entering.
It was meant to be a “quick” test
Last summer, Princess Nokia (Destiny Frasqueri) was finishing her mixtape “1992.” In her new work, the emerging hip-hop artist looks back at her childhood, so she suggested that we film at her favorite New York City playground.
For this video, we used a six-camera GoPro rig, a 3Dio Omni mic, a lav mic and a Sony camera. The footage you see from the eateries was filmed with a Samsung Gear 360. We were a crew of three: Santiago Garcia was behind a “flat” camera, Brad Schirmer was in charge of the sound, and I was the producer and VR cameraperson.
As if filming with spatial audio wasn’t enough of a challenge, we decided to use our “tri-tripod,” a tripod we built so Santiago and I could put our cameras together and get simultaneous footage. Still, our idea was simple: Capture Princess Nokia with our cameras and a spatial mic all positioned in the same spot.
Our main lesson: Know where your “center” or “main field of view” is
When we arrived at Thomas Jefferson Park, the playground Princess Nokia had chosen, we placed the tri-tripod in the middle of the basketball court. Then we marked the ground to show the artist the area where she would be totally in front of the Sony camera and to ensure she kept a certain distance from the GoPro rig, so we could prevent parallax errors.
When Brad saw that the main action would happen near the mark on the ground, he positioned mic 01 (channel 01) to face that way. In his mind, the “center” of the action was where we had put the mark.
For me, the “center” of the action was something I determined in postproduction, using the Mettle SkyBox plugin to move and place the panorama.
Do you remember Nonny de la Peña’s advice for virtual reality: “Feel your body in the space”? Our lesson also sounds like it emerged from a yoga class: “Feel your body in the space AND know where your center is.” We learned this essential principle the hard way, in postproduction.
During the shoot, Princess Nokia freestyled and talked to the camera while we hid behind the trees. After I reviewed the footage, I decided the best opening would be a segment when she was dancing on one side of the playground. This decision about the video inadvertently shifted the center of the audio.
Our mistake seems really obvious now, but we didn’t realize it right away. Not to say we weren’t facing other headaches. For many days I edited a multilayered video with 10 audio tracks: eight channels + lav + music. I studied the tech specifications for uploading spatial content so I knew exactly what I needed from Brad. Meanwhile Brad was figuring out how to use the Facebook 360 Spatial Workstation audio plugin.
I baked the video and audio over and over again, using a tool called iFFmpeg. And then one day it worked. We had uploaded spatial audio for the first time!
But the audio sounded off.
At first we thought the right side was sounding on the left side. We couldn’t find the glitch. We reviewed the whole process. Was the problem in Premiere? In Pro Tools? Was it in the baking? In the platform? Finally, we realized that the disturbance we were hearing was the 90 degrees of difference between Brad’s original center and my new center in the opening of the video.
Why this lesson is valuable
I’m writing this post just after attending our first Immersive Journalism Meetup in New York City — an event Matt MacVey and I had long talked about organizing so New York journalists working in 360 video and VR could get together and share our experiences.
At this first meetup, we had two discussions: one about narratives and the other about spatial sound. When I was sharing my experience with spatial audio and my lesson about knowing where your center is, Matt surprised us all by commenting that there is a new plugin for spatial sound in Premiere! Thanks, Matt!
I haven’t yet had time to try this new plugin, but (please correct me if I’m wrong) it seems to solve the problem I’ve just described. That’s the beauty of VR in 2016!
In any case, since we know that the better we do in production, the less we need to correct in postproduction, “Know where your center is” could prove a durable lesson.
Thanks Journalism 360 and VR journalism community for pushing me to write this post!