How to Create a 3D Music Experience in VR
It’s early 2016, and clients new to the VR space are asking us “what is spatial audio?”, “how does it work?” and “why do we need to pay you do to do this?”.
There was a lot of confusion in the industry, and so we set about creating a VR sound experience, where location recordings, sound design, score and ambisonic room modelling would all play a part.
The trick was to work out how we could make it with no budget, limited time, and no experience on visual VR capture or post. To make the idea work, I needed a number of objects to make sound and move all around the viewer and ideally have our environment contribute to the soundscape in some way.
I figured the easiest (cheapest) way to do this, would be to shoot an object in multiple locations and hard cut between them, to give a sense of contrast and movement. In addition, if the spaces were sonically different, we could make that part of the experience, too. In this way, we could explore how the sound of a place adds to our sense of presence and immersion.
Moving on to our source sound, it needed to be something loud enough to trigger the reverberation of a physical space (a clap in a church, for example) but also have the flexibility to produce long and short sounds and at different frequencies (pitch), so we would have lots options to play with. After working through several ideas, I landed on using a beatboxer, since a good beatboxer could produce all of these sounds and tie them together to create a coherent composition, and also have the freedom to move around in the 360 space, should they need to.
At this stage, I felt I had enough to form a basic concept; we could record a beatboxer in multiple locations, hearing the sonic characteristics of each space, while at the same time having him/her appear in different areas of the 360 space.
I knew it could work visually, but how would it sound? I threw together a very quick test using 3D sound mixing software and here is the result (use headphones for best results):
Although the beatbox performance I used (ripped from YouTube) was fairly limited in terms of dynamics and contrast, I felt this was a good enough starting point.
The next question was ‘who can I find to make this happen?’ and luckily that turned out to be the easy part. On a VR job in Turkey, I had the pleasure of working with the SFX team at The Mill London and told them about the idea. I didn’t realise it at the time, but my business partner was also talking with The Mill in LA and they loved the idea. They came on board as collaborators with just one caveat: it had to be produced and shot within 3 weeks!
I immediately called up an old friend and incredible talent from London, 2 x UK champion beatboxer Reeps One and was thrilled when he said he was up for it. Game on!
This was a fascinating opportunity to scout locations that sounded as good as they looked, so we drew up a shortlist of interior and exterior locations:
We quickly agreed that to get a good range of contrasting locations, the shoot should take place in downtown LA and in the desert, between Joshua Tree, the Salton Sea and Palm Springs. Here are some 360 images we took on our scout:
In addition to these locations, we had access to the YouTube studios in LA to film the intro and closet scenes. We were set.
One Crazy Week
I won’t go into all the details of the production, but needless to say we were against the wire, trying to hit 11 setups in 3 days across 6 areas of Southern California. We also had a hard out on the final day, as Reeps needed to leave for his flight home 30 mins after the final shot. Here’s how the schedule looked:
The approach to composing this piece of music was unlike anything Reeps or I had tried before, since every sound that was being made would need to be seen and ‘performed’ in 360. Not only that, but we were limited to only using certain effects that could be reproduced or indeed generated by the physical environment. These included:
Reverb & Echo: Generated by the size of the physical environment
Delay & Harmony: Generated by multiplying the number of visible Reeps’s
Panning: By moving around the camera (or stop motion movement)
By using just these ‘effects’, we needed to think creatively about how they were incorporated. For example, by making a simple percussive sound in a huge reverberating room (a warehouse), you could create 2–5 seconds of atmospheric texture. You can hear this technique used in the final piece, at 00:45 seconds.
After hearing some early tests from Reeps, it was clear that the piece was going to be more structured than we’d previously thought, with melody, breakdowns and a build at the end. Because it was now becoming a ‘song’ (albeit and experimental one) we decided that it would be best to record it in the studio to make sure it had the punch and presence needed.
We recorded all of the vocals on a Shure SM7B, a great sounding dynamic mic that captured all the detail and dynamics of the beatbox performance. We recorded through a Universal Audio Apollo into Logic Pro X.
After the main layers were recorded, we looked at arrangement, working out how long each section should run and keeping in our mind’s eye which physical spaces we were moving between. By the end of a very long day, we had created a 2 minute piece which incorporated 11 different locations, multiple audio effects that would translate visually, a few breakdowns for dynamic effect and a good build at the end. The only section we didn’t record was the intro, for a very good reason:
Many of Reeps’ YouTube videos show him performing in a warehouse or studio environment and we thought it would be a nice idea to create a similar setting for the opening shot. This way, his fans might think it was a regular 2D performance video, until it suddenly cut to a new environment, where they would hopefully discover that it was a 360 video.
Blocking and Shooting Technique
The following day, co-director Gawain Liddiard from The Mill came to hear the track and work out the blocking. This was a sonic and visual exercise in how to direct the viewer through the experience, which also had technical implications: As we had decided to shoot this 360 video nodally, using a single Sony A7Sii with a wide angle lens, we had to be conscious that wherever Reeps was performing, he wouldn’t be crossing a stitch line
Sound on Location
As this was such a technical shoot, we had to be very diligent that Reeps was performing the correct layer of the song in the correct location. This became more demanding as the piece progressed, as in some places we had up to 11 duplicates of Reeps in the same environment.
For every location (except the intro shot), we had playback of the track for Reeps to perform to. One thing in our favour was that because we were shooting on a single lens camera, we could all be behind the camera to provide playback and feedback on each take, making sure Reeps hit all the cues and that we were hitting our sync points.
After we were done with the performances I would record a few minutes of ambisonic ambience at each location, that would be layered into the video. Here’s a recording of the palm plantation recorded with a Core Sound Tetramic:
Impulse Response Recording
To capture the exact reverberation characteristics of each location, I recorded a series of impulse responses.
This is a technique whereby you record a sample (called an Impulse Response or ‘IR’) of an acoustic space while playing back a sweeping sine wave tone from a speaker. You then process the sample by removing the sweep tone (using an inverse sine wave) and you’re left with the sound of the reverberation in the room.
By loading the IR into a convolution reverb plugin, you can then send the dry studio recording through it and it will replicate the sound as if it was being performed in the space.
Here’s the sine wave sample as it was captured in the warehouse:
And here’s the impulse response applied to the vocal (before/after):
As mentioned earlier, we wanted the intro to be shot in front of a white syc in a studio environment. We decided to record this live, as it was essentially a freestyle and didn’t require any close micing techniques. For this setup I used a single shotgun mic, which was spatialised in post. I decided against using the ambisonic mic, as there can be phase issues when using a shotgun and an ambisonic mic placed in different locations (I was micing with a boom from above).
The post process for this video was closely intertwined with the editorial team at The Mill. The timing of their initial edit was synced to my first mix, then I responded sonically to what I was seeing on screen, once VFX were being applied to the picture. This process of refinement continued for 3 weeks until we had a final version ready to premiere at Cannes Lions.
After receiving the first edit and layering in the studio recording, I began work on the sound design. Some of these elements included spot effects, like the rattling cup and saucer in the shack at 00:36, others were more general, like the background siren noise in the same scene. It was important to have these sound design elements enhance and support the overall journey of the video without distracting from the main performance.
Although all tracking was done in Logic Pro, the final mixdown was in Reaper, using Blue Ripple’s 3rd order ambisonic plugins. These tools were giving us all the control we needed and sounded very transparent, which is essential for this kind of work. Mixing in 3rd order also meant we could hear the mix in high resolution even though it would ultimately be delivered to YouTube in 1st order. However, by retaining the 3rd order information we now have a great sounding mix that can be used when the playback technology catches up (more on that, below).
As you can see below, the session was divided up scene by scene, with reverb effects at the top. We ended up running 140 tracks with 16 channels on each track and with additional effects and video thrown in there, too. My Mac was continually on the verge of meltdown!
Nevertheless, having the scenes broken out in this way meant that I could have control over not only the spatialisation, but the reverb sends, so the reverbs for each location could be cut in and out precisely to the edit.
The main balancing act during mixdown was between maintaining the power and energy of the song whilst making it sound like Reeps was really in the environment. For example, when he’s far away from the camera then he should get significantly quieter, we hear less high frequency and more reverb. The problem with that is we inevitably lose some of the power of the music.
In the end I decided that most viewers would enjoy listening to this as a music experience and less of a technical exercise, so I maintained a strong level from Reeps throughout and let the sound design really do the work, making us feel immersed.
One psychoacoustic effect I used during mixdown was to create an artificial ‘pumping’ of the background sound in the warehouse scene. This means that in the spaces between Reeps’ beats, I raised the levels of the background sound, replicating the sound of a compressor being pushed hard. This is a technique used a lot in EDM and Pop music and gives the listener the impression of the sound being louder than it is.
Mastering / Delivery
Mastering audio for VR is a tricky operation. On one hand, it’s important to keep the levels strong, as playback on mobile devices and sub-par headphones is often weak. However, if you add too much limiting to the signal, it will compromise the spatial information and throw the sound cues off balance. I decided to use a light 3rd order limiter and then I decoded that 3rd order master to 1st order for delivery to YouTube.
Recently, Facebook has released an important upgrade to spatial audio playback — they’ve gone from 1st to 2nd order playback, which is a huge improvement when it comes to spatial reproduction. Plus, we now have the ability to add a fixed stereo source alongside the 3D spatial objects. In this case, all of the synths that have been added towards the end of the video are in stereo, while everything else rotates in 3D. This is a big leap forward for music reproduction in 3D and am keen to see it go further, into 3rd order and beyond. I’m sure we’ll be seeing more interactive audio integrated into mobile and desktop experiences soon, but that’s for another article.
I hope you enjoy the final video and make sure you experience it in a headset (GearVR or similar) with good headphones to really get the most out of it.
Finally, a word from the man himself:
Many thanks to all of the amazing crew, our partners at The Mill, to Reeps One and to my colleagues at Aurelia Soundworks for their tireless work in helping realise this video.