Can your activation ditch the green screen?
Test-driving real-time background removal without chroma key
TL;DR: While you still can’t beat green screen for image quality and fidelity, computer vision algorithms or a high-quality depth camera could give you creative alternatives for real-time background removal. (Scroll down for a short video showing tests of both approaches.)
ALSO: Consider instead bringing a themed background to life with a combination of AR and localized, tightly-focused background removal for a unique experience and takeaway. More thoughts on this near the end of the post.
We’re surrounded. Experiential technologists and marketers are relying on green screen more than ever, from capturing mixed reality VR to creating social marketing via takeaway photos, gifs, and videos.
It’s a double-edged ubiquity, for now. That bold green may be getting played out, but it’s still effective at announcing itself: “Hey, you there! You’ve heard about [mixed reality/VR/social video/animated gifs], right? Well, here I am!”
So, yes, green screen has a voice. But consider the platform you’re giving him. You’re handing out a sizable chunk of visual real estate to a “we’re VR/MR/social” message that says nothing specific about you or your client’s brand.
Another thing you can hand to green screen — he’s very democratic. Whether’s he’s appearing at the five hundred dollar photo booth pop-up at the mall, or the six figure film studio activation at Comic-Con, he shows up looking exactly the same. (I’ll leave it to you to ruminate on whether that’s a good thing.)
Regardless of what you think of green screen’s looks (or those of his sibling, blue screen), he’s not going anywhere. The reason he’s in both the studio sound stage and the video blogger’s bedroom is that he’s the best at what he does.
But he’s not the only game in town.
Background removal via depth sensor
On Oct. 25th (not quite two weeks ago as I write this), Fast Company reported the end-of-life for Microsoft’s once-groundbreaking depth camera product, the Kinect.
The depth camera that talked to your Xbox and later became popular among creative technologists had a big following. As the above article states, the Kinect has been used “for everything from experimental art, to creating next generation UI prototypes.”
The interesting thing is that the Kinect — or at least its depth-sensing technology — isn’t dead at all. Microsoft has revealed that their current HoloLens AR headset uses Kinect v4 (soon to be v5)… and just this week marked a big product launch for Apple: the iPhone X.
As has been reported, the famous “notch” in the iPhone X is a front-facing depth camera. In fact, the company that pioneered using the grid of little IR dots in the Kinect was bought by Apple in 2013, and that technology has been repurposed in the new phone. The iPhone X’s “notch” is basically a Kinect.
While Apple is largely touting the sensor’s ability to help it scan faces, the depth information gathered goes beyond that found in the simple two-camera trick used for the depth planes in Apple’s 2016 Portrait Mode, and I suspect we’ll see a number apps that play with background separation on the iPhone going forward.
For our purposes, in a manner similar to the way that Portrait mode blurs a background while keeping the foreground subject in focus, we can use depth information to lift a subject completely out of their background.
Depth sensor test with a Kinect v2
The depth sensor and the color camera on the Kinect don’t provide information in a corresponding, one-to-one fashion. The depth feed is lower resolution than the color picture and has a different field of view, so you can’t simply lay the depth data over the color data as a mask.
Instead, each “pixel” of depth point is mapped to a corresponding location on the color feed, and you read the color from there. This means that, no matter that the color feed has a higher resolution — as you’re using the depth feed to “draw” the scene, and using the color feed to color it in, you’re stuck with the lower resolution of the depth feed.
Additionally, the camera and depth feed have fixed lenses. You can’t zoom in. For the demo video (see below), I chose to capture hand puppets — in order to frame them acceptably, I had to “digitally zoom” the scene, further lowering the effective resolution. (If I were capturing people, this would be less of an issue, but the inability to zoom in/out is definitely a flexibility hit, given the various footprints and floor plans encountered in real world projects.)
The Kinect v2, introduced in 2013, should be ancient by normal tech standards. That said, to my eye, the “modern-day” market alternatives don’t appear to be stupefyingly more advanced (although they are much smaller, which is nice). Checkout stimulant’s excellent roundup of depth sensors that was updated just this summer.
While I confess I haven’t bought any to try out, the alternatives that caught my eye are Occipital’s Structure Sensor, Orbbec’s Astra 3D camera, and Stereolabs’ ZED Stereo Camera. If you have thoughts about the best alternative, I’d love to hear about it!
Background removal via OpenCV (Open Source Computer Vision)
Using code to analyze an image and identify background and foreground objects is potentially a very deep hole to get lost in. There are people training ML systems to do the work, scores of companies building custom software to help driverless cars “see” the road... You can see Nvidia’s recent work on artificially intelligent systems intended as green screen killers here.
I went the simple route and chose to use OpenCV, the popular, GPU-accelerated computer vision code library, and one of its two built-in background removal algorithms, MOG2. (Even after choosing OpenCV, I could conceivably have spent plenty more time noodling—the BGSLibrary on Github has 43(!) different OpenCV-compatible background subtraction algorithms I could have tried out.)
Although the algorithm is more sophisticated, it’s easier for me to think of its application as a sort of difference matte. First, I let the algorithm create a model of the empty backdrop. When I’m ready, I tell the algorithm to stop learning, and now I’ve effectively created a background. Going forward, anything that doesn’t “match” the background must be foreground.
Of course, this means that, like with greenscreen, you’re still subject to situations where a person who color-matches the backdrop could fail to be recognized (like if the weatherman wears a green shirt).
That means you do need to put some thought into the colors you use (I recommend going with either bold colors unlikely to match skin or clothing, or a color similar to the background you’ll be replacing it with — and, just as in green screen chroma keying, lighting is key, as shadows cast on the background can fool the algorithm.
The great thing here is that we can use any camera we want. I used Éric Renaud-Houde’s Blackmagic DeckLink SDI Cinder block to bring in a high quality HD feed from a cheap SDI camera that gave me control of zoom, exposure, and color.
For quick and dirty playing around, the results are promising. The hot pink background I used (a children’s blanket I picked up from a sale rack at Target) has high visibility so you can see exactly where our edges are hitting or missing the mark.
In addition to the low resolution, the Kinect depth sensor has a bit of lag between the color image and depth data, so in periods of fast movement the cutout runs ahead of the subject. I’m hoping this may not be an issue with other modern depth cameras on the market.
The OpenCV version has visible background around the puppet’s furry edges — my implementation doesn’t do any feathering like advanced green screen chroma key algorithms do. Still, I could see this level of performance working for a number of situations (and again, if the hot pink blanket was going to be replaced by a hot pink digital background, that edge would become much less noticeable).
If high fidelity is a sticking point for the project, I believe using OpenCV and chroma key together could give very high-quality results:
Finally, during the course of running these tests, I became enamored of the idea of using the OpenCV approach to bring dramatic backgrounds to life. For example, consider this image from San Diego Comic-Con 2016:
Imagine now that visitors who stand in front of the backdrop walk away with a social video that shows the giant eyes in the picture blinking and turning around on their stalks to look out at the camera or the guest!
Bringing a themed background to life with localized, tightly-focused background removal could play to the strengths of the algorithm for a high-quality end result as well as keeping the activation engaging both before and after the photo, gif, or video is taken.
If you found this article interesting, please give us a clap! Thanks!
And check out our last post about custom interactive objects:
Custom Interactive Objects via Bluetooth
Going wireless provides new options for building unique experiences and in-person user engagements.
Christopher Lepkowski is an interactive software consultant specializing in custom experiences for entertainment and advertising. He is the owner and director of Planimal Interactive in Los Angeles.