Cinematic VR with spatial audio, a minimal workflow

Compatible with YouTube, Facebook and Vimeo

This is getting silly

Minimal panoramic video

First of all, we’re going to need some form of panoramic video. Some say that it needs to be 3D in order to be considered actual VR, but we’re targeting the lowest common denominator. If you’re interested in a 3D workflow, follow me, a guide is going to be ready soon™.

Currently, the cheapest and easiest way of getting a good enough image quality is using a dedicated consumer camera, like the Samsung Gear 360, the LG 360 Camera or the Nikon KeyMission. They tend to hover around €300 including taxes, they’re widely available and you can use your existing tripod. They’re also very easy to control both with and without a phone and come with complimentary video processing software, for both Android and Windows.

In all cases, the video quality is perfectly acceptable in good lightning conditions, but degrades quickly as soon as it gets dark. The angular resolution is also a limiting factor, it’s best to keep the subjects in the sweet spot, a bit over 2 meters away.

Another important consideration is the thermal envelope: they’re all subject to overheating, to different extents. Try not to use long shots, or don’t use wireless controls if you’re shooting for a long time. They all have airplane modes, which don’t change the battery life very much, but dramatically change the thermal compliance.

Ambisonics the easy way

The LG 360 does on-board First-Order Ambisonics, so it’s definitely the easiest solution. It’s already compatible with YouTube, and has all the relevant metadata. Of course, you’ll want to edit it a lot of the time, which must be handled carefully.

The second easiest thing to do is buying a Zoom H2n, update the firmware, enable spatial audio, and that’s it. It lacks vertical information, but for most videos it’s just fine, and it’s already in today’s preferred format. YouTube will accept it without question.

Just make sure you point the Zoom and the camera in the same direction. In case you get this wrong, you can rotate either the video, using Premiere Pro’s offset tool, or the audio, using kronlachner’s “Ambisonics first order rotator” VST plugin in your DAW of choice.


Both devices have standard photographic tripod threads. You can use two tripods very close together, or build something like this:

The one at the bottom is a cylindrical microphone array with 32 capsules


The tricky bit is reconstructing the temporal alignment between the audio and the video. As this is a minimal workflow, we’re going to stick to free software. This is easy for the audio part, as Audacity is perfectly adequate, but a bit of an headache for the video part. If you have an higher end smartphone, there’s a pretty good mobile editing app called Collect.

For most real-life scenarios, I’d advise getting your hands on a copy of Adobe Premiere.

Editing the video

The first step will be to trim the desired portion of the clips. We’re going to cut the video to the preferred length using MPEG Streamclip. Just set the In and Out points and export the video using the Save As option, so that it won’t recode. Alternatively, you can use ffmpeg.

ffmpeg -i movie.mp4 -ss 00:00:03 -t 00:00:08 -async 1 cut.mp4

If you shot with the LG, and already have an aligned audio track, your best bet is to use ffmpeg, making sure you appropriately specify the audio channels with the -channel_layout 4.0 option. Just export all your clips, concatenate them to obtain your edit, and skip to the metadata section.

Next, we’re going to obtain a synced-up version of the Ambisonics audio for each video clip.

Editing the audio

We’re going to use Audacity. Make sure you enable multitrack exporting in the options, or it will only do stereo files.

Import the audio from one of the clips you obtained in the previous step, and open the relevant audio source from your Ambisonics recording. You will now have to sync them up, and export the relevant trim of the Ambisonics audio.

Once you’ve done this for all clips, you’ll have to mux each video clip to its audio, like this.

ffmpeg -i videoClip.mp4 -i audioClip.mp4 -channel_layout 4.0 -c:a copy -c:v copy -shortest muxedClip.mp4

Finally, concatenate the resulting files.

Injecting the metadata

Download the 360 Video Metadata app for Mac or Windows.

Open it, select your video, inject the metadata, save the resulting video, and you’re good to go.

YouTube and Facebook should accept with no problem, Vimeo will accept it, but it will play back the audio in stereo, with no spatial information. Vimeo doesn’t currently support any spatial audio, though.

Please let me know about any mistake or inaccuracy!

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.