The Art of Using the Microsoft Kinect in Visual Effects Production

Here are my thoughts and experiences about capturing and rendering point cloud data from the Microsoft Kinect.

The studio that I work for in San Francisco, Bonfire Labs, was tasked by Lytro, a computer imaging company, to produce a launch video for their amazing new 360 VR camera: Lytro Immerge.

Full length vision video

The video was a 3 minute piece comprised primarily of interviews by Lytro employees and VR industry leaders including VRSE , Felix & Paul and WEVR.

To supplement the talking heads, we designed a concept to demonstrate the feature set of this revolutionary new camera. We shot this video insert with an Arri Alexa camera and overlaid info graphics using Adobe After Effects.

One of the more complex graphic shots in the piece was to show how the Immerge used light field data to reconstruct the entire 360 scene. When I first heard the idea for this shot, I knew it was finally time to do a project with the Microsoft Kinect depth sensor. I had done a few experiments with the Kinect, but never had a chance to use it to convey a concept. There are many ways that the Kinect can be used and various apps, APIs and SDKs to work with the data. I chose Processing as my tool of choice.

Allow me to stop here and mention that I do not consider myself a coder; I can write some code in various languages, I can understand a lot of code and rewrite code to make it do what I want… but I’m certainly not a coder.

Point Cloud Excerpt from the vision video

Processing is an open source language based around Java that has many resources out on the web and in the open source community. One excellent resource for Processing information is Daniel Shiffman, a professor at ITP in New York. I have read several of his books and followed plenty of his tutorials. Shiffman has done a lot of work with the Kinect and Processing including working on a library to allow the Kinect to work nicely with Processing; and that’s where I started. Through Processing I could get access to all the data I needed from the Kinect2, including RGB data (for reference only) and the depth data. Once I had the code to access the depth image I wrote a function that would render each of the 217,088 pixels into a unique point in 3D space. I could then navigate my virtual camera to make sure I had all the details I needed, and save the data as a vertex only OBJ file. To ensure full coverage of the scene (due to the 8 meter depth limit of the Kinect2) I recorded multiple point clouds from many angles on the set.

The breakdown of our process and pipeline went like this:

  1. Shoot video footage of the scene using an Alexa camera.
  2. Use the Kinect with the Processing app I wrote to capture the depth information from multiple angles to cover the entire scene.
  3. Export all the depth data as OBJ files to be read into Cinema 4d.
  4. Align the OBJ files to make a point cloud of the entire environment.
  5. Matchmove the selected shot from the video using Boujou.
  6. Align the combined point cloud with the live action in Cinema 4d.
  7. Create dynamic transitions in Cinema 4d to build up the entire scene.
  8. Use Adobe After Effects to composite all the elements together and render the final shot.

You can see from the final shot that we ended up with full depth data for the entire scene and were able to reconstruct the scene from any angle.

A freelance visual effects artist, Devin Earthman completed steps 4–8 while I started with the first 3. This was a pretty challenging shot for me even though it only lasted a few seconds. We only had one chance to capture that data and while I had throughly tested each part of the process from 1–8, we had never combined them all in one shot. As I mentioned earlier, my I am not a coder but I must say that I was really excited to flex both my creative and my technical muscles on this project.

Below are some screenshots from our production pipeline.
Depth Images from the Kinect
Point cloud data in Cinema 4d