How FOO Works

Will Smith
The FOO Blog
Published in
7 min readJun 22, 2016

In the time since we launched the preview episode of The FOO Show, we got lots of questions about how FOO actually works. We aren’t trying to keep what we do a dark secret, we just weren’t sure anyone would think it’s interesting. Clearly we were wrong, so here you go.

When we started working on FOO last fall, I only had a few specific goals. I wanted to build a VR creation tool that mimicked the live television studio — a place that lets you front-load your production work so that you could release content quickly and without the need for much post-production. I also knew that interactivity was going to be key for this kind of VR show — in 2016, no one has the attention span to sit passively with goggles on the same way they watch TV, so that meant our shows would be in 3D-rendered, game-like environments. And most importantly, I wanted to be able to do all of this using off-the-shelf hardware that everyone has access to. Not only does using hardware like the Vive help keep costs down, recording shows in VR offers interesting opportunities to performers.

After a few months of work, we’ve built the basic skeleton of the eventual FOO toolchain, the bare minimum we needed to produce The FOO Show. Here’s what we’ve built so far, what we have coming, and a bit about how it works together to let us produce 3D-rendered, interactive VR shows faster and cheaper than anyone else.

The requisite flowchart.

Character Creation

Making avatars is easy, right? Not really. Before I started working on FOO, I didn’t realize what a dark art rigging 3D models for skeletal animation is. It’s an incredibly time-consuming process that people seem to do mostly by feel and trial-and-error. So while it may only take a few days to build a representative avatar of a guest for The FOO Show, converting that 3D model into a fully-rigged avatar could take a couple of weeks, by the time you get it fully implemented and debugged in the engine.

To speed that process, we’ve developed a dynamic avatar generation process that lets our character artists build fully-rigged, implemented, and debugged avatars for our guests in just a few days. This was one of the big tasks we needed to complete before we could start running The FOO Show on a weekly schedule, and it’s close.

Animation

At FOO we capture performances using off-the-shelf hardware, like the HTC Vive or Oculus Touch. When you record using FOO, each performer is actually in the virtual environment, reacting to other performers’ avatars and interacting with objects in the environment.

Compared to traditional motion capture solutions — I’m talking about expensive, multi-camera rigs that involve Andy Serkis wearing a ping-pong ball suit — we can create believable human avatars from a tiny amount of data. How tiny? Using the Vive, we collect just three data points — the head and two hands. How do we turn that into a believably animated human avatar? Lots of complex math. And because we designed our system to run in real-time, it requires far less hand-tuning than conventional mocap.

Updates to our animation tech improve future and past episodes.

Our animations are generated procedurally from code, which gives us much more expressive and human-feeling animations than traditional blended animation. We’ve used a combination of hand-written heuristics and inverse kinematics since last year, and we recently started applying machine learning and evolutionary algorithms to help us produce more realistic results at edge cases. What are the edge cases here? Our shoulders and elbows handle situations that stall most IK algorithms, like when an actor crosses her arms or scratches the back of her head.

The big benefit of this approach is that as our animation technology improv es, the performances in our back catalog will continue to improve. We showed this a recent update to our first episode. The downside of our approach is that it’s much more math-intensive than traditional animation, which is why The FOO Show is only on desktop VR platforms right now — in order to release on mobile VR platforms like GearVR and Google Cardboard, we need to build bake the dynamic animations down into more traditional blended animations.

Recording Studio

On the surface, recording the series of vectors that make up each avatar’s movement seems like a very straightforward task. However, we also need to sync multiple avatars’ recordings with each other and their audio tracks, and we need to also record the position, orientation, and state of the objects that we manipulate in the real world. For the Firewatch episode of the show, we built something very specific for the needs of that game’s assets, but for future episodes, we’re building a more abstract, extensible implementation of that code.

Audio is also hugely important to this type of show. For the Firewatch episode, we recorded audio in two places: outside of the game, using a standalone, multi-track audio recorder as well as in-engine using the onboard Vive microphone. While the onboard mic isn’t suitable for broadcast-quality recording, it let us sync up the recorded performance with the externally recorded audio. During post, it became clear that syncing up that audio is a hassle that slows down production. In order to meet our goal of one hour of post-work per avatar, we realized that we needed to upgrade our in-engine audio recording to support broadcast quality audio formats. This will have the added benefit of halving the number of cables snaking across our recording studio.

Giant audio files with multiple nested sync points are an impediment to fast production.

Editing Suite

The good news is that a talk show interview doesn’t require much in the way of editing, so we don’t need a multi-track Premiere-style video editing interface to produce The FOO Show. We do however need the ability to cut the interviews for content or pacing. We hacked together a quick and dirty solution to cut the 12 minute studio interview we recorded for the Firewatch episode into a 3-minute intro, but we need a more fully fleshed out, in-engine tool that doesn’t require an engineer to use.

Few of the cinematography techniques and best practices we’ve developed over the last 100 years of filmmaking work in VR.

Once we get the basic functionality down, adding tools that allow us to blend two different performances into one scene is crucial. Right now, for the talk show, the VR equivalent of jump cuts are workable, but for more serious or cinematic content, we’re going to need to be more subtle.

This problem illustrates one of the big challenges of creating this type of VR content. Few of the cinematography techniques and best practices we’ve developed over the last 100 years of filmmaking work in VR. In video formats, if a performer nails the first half of a scene in a take, then has a sneezing fit in the back half, you just record another take and drop a transition between them when you edit. With video, it’s easy to change to another shot of the same performer, show another performer’s reaction, or drop in a coverage shot to cover the flub.

In VR, cuts are significant events that signal scene transitions to the viewer. If we want to use the first part of one take and seamlessly connect it to the back half of a second take, we need to do it without the viewer noticing. That involves building animation tools that have never needed to exist before or coming up with a clever solution. We’re comfortable doing either.

Distribution and Client

With the Firewatch episode, we’re distributing everything — the studio set, the watchtower set, the avatars, animations, and audio using Steam and Oculus Home. That works reasonably well for a single show that’s updated relatively infrequently. However, as we add support for more VR platforms, add more shows, and are producing them regularly, the need to run our app through certification for the various platforms every time we add a new show will quickly become a major limitation.

Ultimately, we want the FOO app to behave like Netflix or Hulu — as a front end viewer connected to a massive library of online content that’s available at one click. Unfortunately, game engines aren’t really designed to ingest assets after they’re already running, so we need to do a fair amount of work to make that possible. We’re making progress, but it’s a significant challenge.

The good news is that once we get the infrastructure in place, the size of an actual performance file, including the avatar, is just a bit larger than the audio, and because the files are just sequences of numbers, they compress very well. This means that downloading new episodes will be quick, even if you’re on mobile or slower broadband connections.

So that’s a bit about how FOO works. If you have more questions, post them in the comments or hit us up on Twitter, and we’ll answer them as best we can.

--

--

Will Smith
The FOO Blog

Professional nerd. FOO VR CEO. Corgi aficionado. Consumer of fine coffees. Not an actor.