Stories by Or Fleisher on Medium

Almost Human: Goodbye Uncanny Valley

Or Fleisher — Mon, 06 May 2019 17:27:29 GMT

Holograms, realism, or simply goodbye uncanny valley

As early as the 1980s, popular cinema and television began depicting holograms of the future. These transparent blue figures have been a great influence on our perceptual model of what holograms should look like. Nowadays, innovation in machine learning, computer graphics, and hardware are paving the way for holographic content to become mainstream, and, yet, some of the questions I still ask myself are: Why are we so obsessed with realism? What is the archival benefit of documenting humans in 3D? What is the connection between holograms and personal assistants?

Image credit: AlteredQualia / Branislav Ulicny Uncanny Valley WebGL experience

Let's examine the idea of the uncanny valley. The uncanny valley term was coined by Masahiro Mori and presented in Jasia Reichardt’s book Robots: Fact, Fiction, and Prediction.

In aesthetics, the uncanny valley is a hypothesized relationship between the degree of an object’s resemblance to a human being and the emotional response to such an object [Karl F. MacDorman & Hiroshi Ishiguro].

This idea is perhaps best suited to describe not a phenomenon but an era, which we are arguably transitioning out of. With smarter software becoming ubiquitous in almost every aspect of our lives, we are (consciously or not) building things that behave and react more like us. When we emphasize function over imitative visuals we tend to avoid that uncomfortable “uncanny valley” feeling. Is Amazon’s Alexa or Google Home uncanny? I would argue not since speech generation has become convincing enough paired with pure functional use:

Me: Alexa, put something on my shopping list
Alexa: Ok.

Perhaps the uncanny valley is a result of the transition from 2D to 3D imagery and will no longer be needed once holograms become convincing enough, alongside playing a functional role in our lives.

https://medium.com/media/a5b3166b1c03529b4831699a7b90d9b0/href

Our lives are surrounded by interfaces, apps, physical signage, and, more recently, voice interfaces such as Alexa and Google Home. These interfaces are meant to serve a specific function in our day-to-day lives but, more often than not, they look, feel, and sound nothing like us. With more and more demand for computer-generated imagery (CGI) in recent years, and the popularization of platforms such as augmented and virtual reality, computer games, and interactive filmmaking, it is clear there is a potential for new human-computer interfaces that also resemble us visually.

There is a potential for new human-computer interfaces that also resemble us visually.

To create that, companies, and artists are exploring various forms of 3D capturing meant to replicate the human element, in new and compelling ways beyond two-dimensional pixels. These tools are what forms the basis for volumetric capturing, a collection of techniques for capturing three-dimensional humans.

These tools are not born in a vacuum. Some are born from a product vision, such as Intel’s Replay technology for 3D sports, which engages sports fans in a new way by allowing them to replay a move from different angles. Other tools are born from computational aesthetic explorations, such as Scatter’s Depthkit, which was initially developed in order to create CLOUDS, a volumetric documentary about creative uses of software. One thing all of the tools share is the visual, technical, and anthological exploration of how to represent and document real humans in 3D space.

Image credit: Intel TrueView

During my thesis research in ITP, I focused on the possibilities of using machine learning to reconstruct archival and historical footage in 3D. The idea of using machine learning was born out of a desire to look back into more than 200 years of visual culture (i.e. 2D photography) and speculate about how we can bridge the growing gap between 2D and 3D.

https://medium.com/media/a0a10ad4ffb26e5774b12e88f2983dca/href

Recently, Apple has included a depth sensor into the new iPhones. Facebook now lets you post 3D photos to your wall. And Snapchat has augmented reality facial filters. These are only a few examples of recent spatial interfaces invading our daily lives. My research has led me to believe that we are in a moment of acute awareness of the transition from 2D to 3D. Even though some might argue that we have already entered the 3D era, it seems to me that this is only the tip of the iceberg. Look back at the transition from black and white to color, from analog film to digital, and, through that lens, think about the transition from 2D to 3D. This is nothing short of a revolutionary cultural moment.

For example, today, black and white imagery is used to symbolize authenticity and age, but before the widespread availability of color photography, it was considered a representation of current reality. So how will we look back at two-dimensional media a hundred years from now? Will it only be a testimony of our past, or can we bridge the gap by using new technologies?

The evolution of photography, Image credit: Volume

Machine learning advancements have made it possible to relive films in a way that generates compelling results. For example, as a part of my thesis research, together with Shirin Anlen, we reconstructed a scene from Pulp Fiction in augmented reality. This example used machine learning to separate the characters from the background and generate volumetric figures that are then used in the augmented reality scene.

https://medium.com/media/a440146e50365e12be0eaa27bef79b1f/href

Volumetric capturing?

There are a wide variety of techniques, which range from laser scanning (also referred to as LIDAR scanning), infrared sensors (a notable example is Microsoft’s Kinect camera) and, most recently, the use of machine learning and convolutional neural networks to reconstruct a 3D object from 2D images (as shown in the above Pulp Fiction example). These methods all have roots in different fields such as defense, robotics, and topology but are now being used more and more for art, entertainment, and media.

Image credit: Marshmallow Laser Feast MEMEX | Duologue on Vimeo

Computational humans? Alexa gets a body

Innovation in machine learning doesn’t only affect the fidelity of the 3D acquisition and rendering process but also provides ground for procedurally generated facial expressions and dialog which bears an amazing resemblance to us, the human counterpart. For example, Hao Li, Director of the Vision and Graphics Lab at USC and founder of Pinscreen, published research showing the ability of a Generative Adversarial Networks (GANs) to reconstruct a 3D facial model from a 2D image, which can then be puppeteered.

https://medium.com/media/4a8761afc272badecb8b018d821b5c11/href

Popular entertainment is also taking note. In order to create facial expressions behind Thanos in Marvel’s Avengers films, VFX studio, Digital Domain, created machine-learning-driven software, called Masquerade, that aids artists in creating compelling facial expressions. Imagine Google’s Duplex demo, combined with the facial expressions produced by the Masquerade software — personal assistants are given a big facelift, well, quite literally.

After watching some of these tech demos, I found myself engaged in a conversation about the nature of personal assistants with Dror Ayalon. An interesting point arose in that we are experiencing a transition from personal assistants morphing into personal companions. The idea of embodying that voice that keeps our Amazon shopping lists, turns the lights on, and sets a timer during our spontaneous decision to cook is yet another step towards Alexa getting a body and becoming almost human.

We are experiencing a transition from personal assistants morphing into personal companions.

Films have already imagined this idea, and it seems there is still a way to go before we can get to the vision portrayed in Her where Alexa sounds like Scarlett Johansson and helps you win a holographic video game.

https://medium.com/media/c8dc7769f1b1ce2e846f2a8631b4c18b/href

There is an argument to be made that Alexa doesn’t necessarily have to look like us. Take, for example, Anki’s Vector robot, which provides a very compelling experience without some of the visual human features; it feels like a physical embodiment of Pixar’s communication of emotion through sounds and facial expressions.

https://medium.com/media/a55c3e79e713c6951c98f6167dc10df0/href

That said, a human representation could stretch beyond novelty and utility into something that resembles a relationship, not just “Order more toilet paper.”

Alexa, stop!

All this said and done, it seems spatial computing, volumetric capturing, and voice interfaces are becoming interconnected and are on a track to disrupt our perception of the uncanny valley. Personally, I am really excited about the possibilities in software interfaces becoming, well, a little more like us, and a little less like this.

Scrolling — From Giphy

Immerse is an initiative of the MIT Open DocLab and The Fledgling Fund, and it receives funding from Just Films | Ford Foundation and the MacArthur Foundation. IFP is our fiscal sponsor. Learn more here. We are committed to exploring and showcasing media projects that push the boundaries of media and tackle issues of social justice — and rely on friends like you to sustain ourselves and grow. Join us by making a gift today.

Almost Human: Goodbye Uncanny Valley was originally published in Immerse on Medium, where people are continuing the conversation by highlighting and responding to this story.

How Vimeo can power live streaming holograms

Or Fleisher — Mon, 08 Oct 2018 20:51:06 GMT

At a recent meetup in Vimeo’s Brooklyn office, we live streamed a 3D hologram in real time using Vimeo Live, and it was awesome.

The popularity of holograms isn’t new, but the interest to make this technology real has reached critical mass over the past year thanks to rising availability of virtual reality and augmented reality headsets (or VR and AR, respectively). Facebook’s 3D photos announcement and other innovative ideas such as the Looking Glass Factory holographic display are proof that holograms are becoming a real thing, even if they’re still in their infancy.

Magic Leap’s promotional video for their mixed reality headset

A little over a month ago, I joined Vimeo Creator Labs as a creative technologist, and we immediately started exploring how Vimeo could play a role in shaping the future of volumetric content for creators. (When you think volumetric, think 3D video.)

Last month we also co-hosted a Volumetric Filmmaking meetup in our Brooklyn office with Scatter (which you can read about on our Vimeo blog), and we wanted to demo something original. While I personally have been exploring volumetric video for some time now, we thought that leveraging Vimeo’s API and transcoding capabilities would be a good demonstration of how we can make volumetric video more accessible and easier to deliver. We wanted to take advantage of everything that Vimeo provides, which brings us to the question that we asked ourselves. Can we live stream a 3D hologram of the presenter in real time?

(The answer is yes.)

Luckily, last year we released our live streaming product and also acquired Livestream, which means we get to work with some of the brightest minds in the live streaming world. But before I dive into the actual tech, let’s explore some of pop-culture ideas that informed and inspired our thinking along the way.

Beyond half-opaque (glitching) blue humans

Our conception of holograms and holographic imagery owes a huge debt to science fiction. Remember Princess Leia’s plea to Obi-Wan Kenobi back in the original Star Wars movie?

A GIF of Princess Leia’s hologram from Star Wars

Thematically, it seems the two dominant areas featuring holographic imagery in science fiction have been centered around:

Representation of humans, usually in order to “deliver a message.”
Representation of computer graphical user interfaces. And yeah, we are still waiting for that Minority Report operating system.

If you watched Black Panther, you must have wondered, “When will I get that Kimoyo Beads hologram thingy?” Well, the honest answer is very soon. Minus the beads and magical dust.

Black Panther VFX, courtesy of Perception

Outside virtual, augmented, and mixed reality headsets, another big milestone for the adoption of holograms was the rise in popularity of projection-based holograms. While the topic of projected holograms is juicy enough for a post of its own (comment if you’re interested in reading one), I’d like just briefly to mention a few. Hatsune Miku, a Japanese pop star, was one of the first to create an anime-inspired character as the persona of the lead singer and project a hologram of “her” performing on stage.

A live-performance hologram of Miku

Following Miku’s on-stage holograms, which started more than eight years ago, we’ve seen a rise in popular acts incorporating similar techniques. From Tupac’s hologram performance in Coachella to the Gorillaz concerts entirely dominated by the animated-character holograms on stage, it’s becoming increasingly likely that you’ll go to a concert to see a performance, only to discover a hologram projected on stage.

From the content creation side, capturing human holograms is a task that usually involves one of two popular techniques:

Depth-based volumetric video. This technique is probably the most popular for its relatively easy-to-use nature and computational simplicity. Meaning that by using a camera capable of capturing depth information, much of the heavy lifting is done by the camera itself. Such devices are also relatively affordable and available to purchase, making the them very (hacker) friendly. A notable tool for such work is Scatter DepthKit, which provides a simple-to-use interface for capturing and rendering volumetric video using depth cameras and a digital SLR camera.
Videogrammetry. This technique uses multiple, regular cameras looking at the same object and running the videos through a SIFT algorithm (meaning “scale-invariant feature transform”) to reconstruct a 3D point cloud by matching correlating features for multiple views. To achieve good results, you need to photograph the object from a high number of angles and process the data with quite a lot of computational power, but on a commercial scale this method can produce very impressive 3D captures. Notable tools include Agisoft Photoscan and RealityCapture.

A volumetric capture using videogrammetry from Pacific Northwest Ballet recorded at Microsoft’s volumetric studio

While these techniques continue to evolve, the rise of machine learning is also becoming an important player in the field of creating holograms. Research is continuously being published around the idea of inferring depth from color images and reconstructing 3D models, which could give us holograms from any 2D video. By using a special type of machine learning architecture known as convolutional neural networks, an algorithm is able to learn the correlation between sets of two images — color and depth — and upon successful training, predict the depth by itself, from any 2D image or video. If you are interested in the math behind the idea, you can read more about it here.

For example, the research done at the University of Washington trained such convolutional neural network to be able to reconstruct 3D holograms of soccer players from any 2D video of a soccer match. Check out this talk by one of the researchers if you are interested in learning more.

Reconstructing a soccer match in 3D using machine learning

All these examples and techniques share an underlying idea: of presence, and more specifically, of human presence. And while I’ve been exploring these themes for quite some time now, the idea of live presence, of knowing something represented in virtual 3D space is actually happening right now somewhere else, always seemed to be the place that this is heading towards. Let’s call it telepresence.

Live streaming your digital self using Vimeo Live

To live stream 3D information, we must capture and store that 3D information. At first glance, it looks like a difficult problem, but the solution turns out to be quite simple. If you convert depth data into color, what you end up with is a representation that looks similar to thermal imaging. Since it’s just color, we can encode that into a video format. If you combine the color data with the original RGB data, you end up with a video that looks like the GIF below.

A GIF of Casey’s depth and color video being live streamed from the Volumetric Filmmaking Meetup

So how do you actually do it? First, you need a Vimeo Live membership, which we’ll be happy to help you with if you don’t already have one.

You also need a device capable of capturing depth from the real world. While there quite a few options (such as Kinect v2, ZED, Orbec, Intel RealSense, Structure, and even the new iPhoneX TrueDepth sensor), our goal was to choose one that is cheap, accessible, and reliable across platforms, so we picked Intel’s new RealSense D415. It costs about $100, it’s USB-powered, and it works on all platforms. Unfortunately, it isn’t perfect. The depth data that you get contains a lot of noise, but we were able to build an application to capture, filter, and prepare the volumetric video for streaming to Vimeo.

The result is the open-source Vimeo Depth Viewer, currently available as a prebuilt macOS application. (If you’re interested in diving into the source code yourself, you could build it for Windows and Linux as well.) The app features a full GUI for interacting easily with the RealSense depth camera.

Connect your camera, start the application, and click the play button. If everything is configured correctly, you see the stream coming in from the camera. Adjust the distance that the camera clips (to isolate from the background the person who you’re live streaming) by changing the value in the Clipping panel.

Next, open the full resolution monitor. This is the window that you live stream from.

A demonstration of the Vimeo Depth Viewer

We didn’t want to spend any time building our own live encoder into this app, so we decided to use OBS, or Open Broadcaster Software. OBS is a free and open-source application that enables you to broadcast a live stream from your computer.

Through OBS, you can capture the full resolution monitor from the Vimeo Depth Viewer and stream it to Vimeo Live.

Next up, create a live event in Vimeo Live and grab the stream key. Head over to Vimeo, create a live event, give it an event title, and click Next.

While logged-in, you’ll see the Create live event button on the left-hand side

A message appears, stating that your event hasn’t started yet. Click Next at the bottom, and then choose Show RTMP URL & Stream key to get a screen like this:

A preview of the RTMP URL & Stream key on Vimeo.com

Now, copy the stream key and paste it into the OBS encoder, like this: open OBS, go to the settings, and select Stream. Set the service to Vimeo and paste in the stream key you copied from us.

The OBS settings screen

Open the Vimeo Depth Viewer, open the monitor window, and set OBS to stream the monitor window. If you’re not sure how to target a specific window in OBS, refer to their handy documentation.

Assuming that everything went well, you’re now, at this moment, live streaming volumetric video to Vimeo Live. Look at you go! Congratulations!

A live stream of Casey Pugh using the Vimeo Depth Viewer and Vimeo Live

That’s exciting, but it’s still only half of the story. Now you need to build a web application that can ingest the live stream and render it into a 3D hologram. To do that, and do it easily, we built an open-source WebVR player called Vimeo Depth Player. You can find the code needed to run the 3D web renderer on GitHub, and, for easing the process, we created Glitch examples for you to remix and call your own. Go to our (now archived) live stream demo, open the options, and click Remix on Glitch to create your copy. Then go to Glitch to remix the demo into your own example.

The first step is to generate your own Vimeo token to identify your account when you request a video stream from the Vimeo API. Generate the token, and then copy it and paste it into the .env file in the example that you remixed above.

Preview of the .env file in your Glitch project

Next open the public/client.js file from the Glitch file browser, and swap line 43 with your Vimeo video ID of the live feed. You can find this value in the URL for any Vimeo video.

Paste the ID into Glitch at line 43 in the part that says new Vimeo.DepthPlayer(‘id_goes_here’).

Preview of the public/client.js file in the Glitch editor

As soon as you paste your ID, Glitch will automatically update your website. You are now live streaming volumetric video over Vimeo Live into any browser or VR headset. And, you know, making history.

A volumetric live stream of Casey

So now that volumetric live streaming is possible, what can you do with it? A telepresence virtual reality experience? An augmented reality concert? A mixed reality news broadcast?

Who is this made for?

Well, everyone. We’re aware that costs are involved to acquire a depth camera and a Vimeo Live account, so the technology isn’t free. But the affordability of volumetric video isn’t out of the reach for many for whom it holds the most exciting possibilities: creators, artists, broadcasters, musicians, and many more who will invent the next-generation use cases to push video beyond its 2D status quo. Since this is a nascent field, by sharing our experiments, code, and the things that we learn, we can actually understand more about what people find interesting or appealing about using volumetric video in their creative projects.

I also learned that, while this field is maturing, it lacks creative, creator-facing tools that would make it easier for people to adopt these technologies for their own films and interactive experiences. In short, it lacks a sense of accessibility and inclusion. This experiment is a part of our ongoing work to make volumetric video more accessible, which you can learn more about in this talk by Casey Pugh, head of Creator Labs at the Volumetric Filmmaking meetup.

Wanna stay in the loop? Sign up for our mailing list to hear (sparingly) from three-dimensional humans about how Creator Labs is empowering the next-generation of creators.

Originally published at https://vimeo.com/blog/post/how-vimeo-can-power-live-streaming-holograms.

How Vimeo can power live streaming holograms was originally published in Vimeo Engineering Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.