Experimenting with the Kinect for Mixed Reality

Published in

archergroup

5 min readJan 30, 2017

What is mixed reality?

Mixed Reality arose as a way to better present VR games and applications. It’s not sufficient to simply record what the user sees through the headset, since that video is typically shaky, and you can’t see what the user is doing with their hands or their body. Instead mixed reality aims to super-impose video of the user from the real world over the virtual reality content, so that it looks like they are in the virtual scene. For a great explanation and more info see this blog post by Owlchemy Labs (makers of the wonderful Job Simulator title)

Conan O’Brien also did a hilarious trip into VR on his show (using Mixed Reality).

What is the purpose of this experiment?

A typical mixed reality video involves a lot of setup. For people with a single fixed camera, care must be taken to find the exact position and orientation of the camera. It also requires the use of a green screen to remove the background behind the user, and one must be careful not to move the camera once it’s been set up. Once recorded, three streams of video need to be edited together (the background, the green-screen subtracted user video, and the foreground). Needless to say this involves a reasonable investment to set up the cameras and green screens, and also the extra time necessary to edit the video. Plus there’s no way to see what the final output will look like until after you’ve mixed the various video streams together.

We decided to see what can be done using an off the shelf Kinect V2 camera (the kind that came with the Xbox one when it was first released). Support for the Kinect has waned in recent years and they can be picked up quite cheaply on ebay. We got ours for around $100. The only other piece of hardware we used in this project was a Camera stabilizer that we got for around $20.

The advantage of the Kinect is that it produces Depth information. Meaning we can calculate the precise 3D position of every pixel the camera records. This allows us to render a 3D representation of the user in real time in a game engine. Having the 3d information allows us to remove the real world background behind the user without needing a green screen, and it allows us to correctly render the person at the correct depth, so that things will appear in front or behind them without having to fake that by editing multiple layers together at the end of the process. So we can get a real time preview of what the scene looks like.

The results look something like this:

While not perfect , this is a pretty good result for the small amount of money we spent on hardware. Future improvements would be working on getting the latency lower, adding 3d lighting to the user from within the virtual world, and applying filters to remove artifacts. Owlchemy Labs has been pursuing similar techniques using more expensive cameras with very impressive results.

The rig

Our camera rig looks like this:

Our mixed reality rig. Rubber bands not included.

This is a standard hand held camera mount. By mounting the Kinect with a controller, we can get the position of the camera and do moving shots as opposed to just stationary ones. Having it in a fixed configuration also aids with calibration.

Improvements: Latency

One big issue with the above footage is that the ‘reality’ portion of the video lags a bit behind the virtual scene. This is because the Kinect runs at a lower framerate and also has a considerable amount of latency. We could fix this by manually editing the video later to try to sync the streams better, but this would require rendering out two versions of the video and lining them up in another program. We could also get a more expensive camera that runs at a higher frame rate and has less latency. In keeping with the original goal of a cheap, real-time rendering solution, let’s explore another option, which is to delay the virtual content to match the latency produced by the Kinect.

Delaying everything in the scene would be complex and probably out scope for our little library, though it might be possible using some kind of instant replay or demo recording system. A simple solution is to simply smooth the position of the ‘reality’ camera using linear interpolation. This has the effect of correcting some of the apparent latency when moving the camera, and also helps get rid of some of the camera shakiness. In addition, since the smoothing occurs at 90fps, it gives some impression that the Kinect video is a higher framerate than it is.

Notice that the video on the right is much less jumpy and seems more anchored.

The same trick can also be applied to objects the user may be interacting with. This video shows the real position of the controller vs. a cube with smoothed tracking applied to it. Notice that the cube tracks much more closely to the hand.

Smoothing object positions so they match the Kinect video

This trick won’t be appropriate for every game or experience, especially ones with fast action, but it might be sufficient for some cases.

Improvements: Better background and floor removal

Currently we’re using the position of the headset, controllers and floor decide which parts of the Kinect video to draw, and which are part of the background. We are also doing some crude depth checking to remove pixels around the edge of the person. The edge removal would benefit from standard 2D image processing techniques and we could investigate doing a better approximation of the person than just a bounding box.

Improvements: Lighting

In theory we could make the mixed reality view use the lighting from the VR scene to better make the person look as though they are in the virtual environment. This would require computing the surface normals from the Kinect depth information and modifying the shader code that renders the ‘reality’ video stream (which currently does no lighting). A normal is a vector that expresses the direction that a 3d surface is facing, and is a necessary part of most lighting calculations.

Here’s some early work in that direction:

Conclusion

If you you have a Kinect V2 laying around and want to try this out for yourself, you can check out our project. The code is available on github for any aspiring VR developers to start building their own mixed reality project.