Eric Florenzano
4 min readJun 1, 2016

When the internet was new, it was mostly text. It wasn’t just text though, it was hypertext, and the ability to interlink and share those text documents has brought on the information age that we live in today. As technology has evolved, the primary types of content that we share has evolved too — from text, to images and sound, to video. Each content type brings something unique that no other can provide. For the past few years I’ve had a hunch about what VR’s major contribution to this will be. Now I’m convinced, and this post will try to make the case: the next major shared content type will be the captured human performance. Sound crazy? Hear me out.

Will Smith on The Foo Show

Here’s my thesis for what it takes to be a major shared content type: it needs to be compelling in its own right, it needs to provide an experience substantially different from other content types, and it needs to be easy to create and consume. Let’s take a quick look at other major content types.

We know intuitively why a picture is worth 1000 words, and it’s true that pictures are more information dense than text. However pictures also take advantage of the power of our truly incredible human brain. If the pixels on a person’s face change even slightly, our brain knows to interpret them as happy or sad. It can make inferences into their situation, history, or even make predictions on what will happen to them. Pictures are compelling, and the experience they provide is unique. With the ubiquity of cameras on phones, now they’re easy to create and consume too. So pictures have become a major content type.

The same is true for sound and video. Someone could describe a melody to you by analogy, or take a picture of someone playing the guitar, and you can almost imagine what the music would sound like. However, unless you hear the music for yourself, or see a video of it with sound, you‘re missing out on the experience. And again with computers and cell phones, cameras and microphones, we can now consume and share this content easily.

Cloudhead Games using motion capture for The Gallery

So, back to our next major content type: the captured human performance. Since that’s a mouthful, let’s just call it a mocap. Before I make the case that it’s compelling, unique, and easy to create, let’s make sure we know what it even is that I’m calling a mocap. It’s a combination of a voice recording, and the tracked movements of points on your body. Right now that means a recording from the microphone of a VR headset, and the tracked position of the user’s head and hands (advanced setups would track much more.)

If this doesn’t sound compelling to you, you’re not alone — it doesn’t sound compelling! But try watching an avatar play one of these mocaps in VR and you’ll agree, you really feel like there’s someone in there. Similar to pictures, these mocaps also help unlock the power of our brain, but in VR our spatial reasoning can also show up to the party. And it turns out we’re really, really good at spatial reasoning. They’ve done studies to determine the lower bounds of avatar fidelity, and have found that even if an avatar looks horribly unrealistic, our brain wants so badly to anthropomorphize, that even with just a few captured motion points we simply accept the avatar as a person.

On one hand, it’s unfortunate that you have to try it in VR to understand how mocaps are compelling. On the other hand, that is proof of its uniqueness — it’s born out of a brand new medium, so it is unique by default.

So now we’re left with its ease of creation and consumption. Anyone with a VR headset can record their head orientation, and with advanced headsets you can also record the head position and hand poses. While this is still a small set today, please note that it is every VR headset that has this capability to record in some form, and each also has the capability to consume the content. It’s a capability that’s inherent to the platform, and VR is a fast-growing platform. As a bonus, position tracking information and some short audio clips are (immensely shareable) tiny files and not much code. So we have a new content type that’s very compelling, native to a unique and fast growing platform, with very desirable sharing properties.

The other thing that’s interesting to note is that it’s currently hard to use the keyboard in VR, so for now and in the short-to-medium term people will avoid using the keyboard much. If you want people to be able to communicate in an asynchronous way, you’re going to have to do either audio or mocap.

Example of mocap recording and playback in UpvoteVR

Given all this, last week I thought it would be a nice feature to add mocap reactions to UpvoteVR, a VR Reddit exporer app I’m working on. That is, after you’re finished watching a video or viewing an image, you now have the ability to leave a motion captured performance for others to later see. Having worked on mocap stuff before (on The Foo Show) and seeing how nicely that turned out, I expected it to work well. What I didn’t expect, was how much these reactions would improve the whole experience— this little feature has become the main point of the app, and if you try it in yours, it may transform your VR experience too.

Eric Florenzano

http://Soundboxing.co Virtual Reality consulting ⁽ᴰᴹ⁾ Likes React.js/Native, Go, Python, Kubernetes. Ex: Mochi, YC, Twitter, Gyrosco.pe.