Representing Humans in Mixed Reality

The importance of volumetric video and authentic human capture

Published in

Microsoft Design

7 min readMay 8, 2017

James Turrell designs with light. Stepping into his work blurs one’s sense of depth and focus. Walls seem both close and infinite, brightness gives way to shadows. Unfamiliar perceptions designed by carefully balancing the light’s color and diffusion. Turrell describes these sensations as ‘feeling with your eyes’, a way of extending one’s understanding of reality. Fantastic worlds like the ones Turrell imagines are powerful tools to exploit our senses, not unlike the environments being explored in immersive (virtual reality) experiences today.

Representing Turrell’s work in an immersive experience makes for a compelling challenge. Lighting, scale, and spatial audio present opportunities to represent his work. While the exhibit’s geometric surroundings would require relatively simple 3D modeling, they are secondary to the artist’s focus: the light’s impact on the senses.

Turrell’s stark, surreal minimalism is the hallmark of his work, but what if we wanted to represent an exhibit with more complex materials?

In 2013, the artist Ai Weiwei unveiled a tangling work of art featuring 886 antique stools at the Venice Biennale. Each wooden stool came from an era where Chinese craftsmanship was highly valued, where these stools would have been passed down between generations. The stools themselves — the intricacies of the wood, the precision of the pieces, their careful placement — are critical to Ai’s commentary on modern culture.

How would you represent the complexity of this environment in an immersive or holographic experience?

The antique stools deliver the artist’s message through their authenticity. Their realistic representation is critical to the experience, creating a technical challenge: Sculpting each of the 886 stools in 3D would be enormously exhaustive and expensive. How long would it take to model and position? How would you maintain the authenticity of the material? Recreating these objects from scratch becomes, in many ways, an interpretation of the artwork itself. How can you preserve the artist’s intent?

Capturing real world assets for mixed reality

The alternative to creating from scratch is capturing the real thing. Through an ever-advancing set of capture methods, we can develop authentic representations of each of the core asset types found in mixed reality experiences (environments, objects, and people).

The broad categories range from well-established 2D video to the newest forms of volumetric video. In the case of Ai Weiwei’s exhibit, scanning (often referred to by its fundamental technique, photogrammetry) could be employed throughout the exhibit, capturing each of the stools individually. 360° photo and video capture is another method for virtualizing the experience, using a high-quality omni-directional camera positioned throughout the exhibit. An immersive experience could be created with these techniques, allowing users to understand the scale and craftsmanship of Ai’s work and materials. All this while existing in a digital form, enabling new perspectives that might be impractical or impossible to recreate in the real world.

What kind of opportunities emerge when we cannot only create fantastic elements, but include highly-accurate and realistic elements in mixed reality? Exploring the overlap between methods of capturing environments, objects, and people can help illuminate where the medium is headed.

For environments and objects, 360° imaging software is evolving with help from techniques like photogrammetry. By isolating depth information from scenes, advanced 360° videos help alleviate the feeling of having your head stuck in a fishbowl when looking around a virtual scene (aspiring to six degrees of movement), enabling a far greater feeling of immersion.

For people, new methods are emerging that combine and extend motion capture and scanning. People are the most complex elements of mixed reality. Authentic human representation has long been a struggle of the visual effects industry, and just as with the antique stools of Ai Weiwei, capturing a sense of authenticity is key. Motion capture has been foundational to bringing detailed human movement to cinematic characters, while photographic scanning has advanced to capture detailed human visuals like faces and hands. With advancements in rendering technology, volumetric video builds off these techniques by combining visual and depth information, creating a powerful method for 3D human captures.

Opportunities for volumetric video in mixed reality

Some of the most eye-opening moments of today’s immersive experiences are social. From sharing a holographic experience together in your living room, to seeing your friends in unbelievable new environments. The human element makes even the most fantastic reality, a reality.

Avatars in immersive experiences enable a new kind of embodiment in storytelling. The latest apps are rethinking the concept of virtual body ownership and setting up a generational leap in eliminating the distance between people. Companies like Mindshow are developing creative tools that leverage avatars, letting users take on entirely new personas and characters. Others are exploring methods of artistic expression, a potentially limitless creative opportunity to explore the nature (and necessity) of human-like attributes. Today, this absence of realism helps avoid the uncanny valley of human likeness along with a host of technical issues for mixed reality developers. For these reasons (and more) it is very likely that non-realistic avatars will become the default for the foreseeable future.

And yet, while realism poses an enormous challenge for mixed reality, there are key scenarios that require authentic representation of humans in 3D space.

At Microsoft, a small team borne out of Microsoft Research has spent the past several years developing a method for capturing humans through a form of volumetric video. The process today is similar to video production: rather than applying movement to a sculpted asset it is a full, 3D recording. The performance and the image are captured in real-time — it’s not the work of an artist, it’s an authentic representation. And while the technology is just beginning to expand into commercial applications, the implications of volumetric video are critical to Microsoft’s vision of mixed reality.

Authentic human capture unlocks new and unique categories of experiences in mixed reality. Seeing someone you recognize, whether it’s a celebrity, a colleague, or a loved one, creates a depth of intimacy never before possible in a digital medium. Their face, their expressions, the nuance in their movements are all part of who they are. What opportunities emerge when we can accurately capture these human qualities in 3D space?

Today the team is pushing the bounds of volumetric video by focusing on sectors like entertainment and education: Actiongram features creative characters and celebrities to create stories with holograms. Destination: Mars exhibit, now at NASA’s Kennedy Space Center, features a volumetric video of legendary astronaut Buzz Aldrin. The experience allows visitors to walk around the surface of Mars with Buzz as he introduces the pursuit of human colonization on Mars.

Humans are fundamental to the future of mixed reality

Designing ways to make these videos seem natural poses a challenge but one in which the team sees enormous potential. And these opportunities will expand as the technology becomes more accessible and moves from recordings to real-time capture.

Holoportation is a research effort that builds upon the same fundamental technology, authentically capturing visual and depth information, and rendering the result in real-time. The team is exploring what the power of realistic human representation means for the future of conversations and shared experiences. What happens when a three-dimensional capture of someone, from anywhere in the world, can be added into your environment?

From layering a new level of immersion onto everyday apps like Skype— volumetric video opens unique scenarios: A specialist virtually training doctors on a far-away continent or digital friends sitting on the couches and chairs in your living room. Successfully bringing humans into mixed reality experiences will radically reshape the concept of digital meetings and business travel.

Just as the abstract art of James Turrell and the critical realism of Ai Weiwei offer their own unique technical challenges, so do the methods to represent humans as creative avatars and realistic captures. One cannot be ignored in light of the other and exploring the potential of each will help us understand the ultimate potential of human interaction in this new medium.

Special thanks to Spencer Reynolds, Jason Waskey, and Steve Sullivan for contributions to this subject for a recent workshop at Microsoft’s Holographic Academy.

You can learn more about designing for Windows Mixed Reality on the Windows Dev Center.

To stay in-the-know with Microsoft Design, follow us on Dribbble, Twitter and Facebook, or join our Windows Insider program. And if you are interested in joining our team, head over to aka.ms/DesignCareers.

Representing Humans in Mixed Reality

The importance of volumetric video and authentic human capture

Capturing real world assets for mixed reality

Opportunities for volumetric video in mixed reality

Humans are fundamental to the future of mixed reality

Written by Mark Vitazko