Image blending with Mask R-CNN and OpenCV

HiuKim Yuen
Jul 21, 2018 · 4 min read

We have been recently consulted by a potential client about an interesting project regarding image blending. In short, the idea is to allow end users to take photos of themselves and blend them into historical photos.

Image for post
Image for post

Normally, it would be a pretty straight forward and easy task if users’ photos are taken in a controlled environment. For example, by using green screen background, we can easily extract all the pixels of persons, and then merge it into the target photos with some kind of blending algorithm. However, the challenge of this project is that the photos are taken by end users, possibly in any environment with any background. Immediately, this become a much challenging problem.

Image Segmentation with Mask R-CNN

After some brainstorming, we end up trying out Mask R-CNN¹, which is a deep learning image segmentation technique, to extract persons from the images. There are many well written library with pre-trained models that we can use directly, and we are using this one: https://github.com/matterport/Mask_RCNN

Image for post
Image for post

With couple of small modifications on the given sample script, we can easily capture the person. The next step is to extract the relevant pixels and merge them into the target historic photos.

Image blending with OpenCV

OpenCV is a very mature library and contains many out-of-the-box image processing algorithms. For the purpose of this project, we use the SeamlessClone API². To use the SeamlessClone API, we first need to define a mask that cover the source image. In other words, we need to use the result of Mask R-CNN to create a mask. Technically, a mask is black and white pixels, with white pixel indicating the regions that need to merge into the target, while black pixels should be ignored.

Image for post
Image for post

Image below is our first attempt.

Image for post
Image for post

There are two major problems with the above result. First, the merged person looks too “ghosty”. Looks like the edges are too soft. Second, the color tone obviously doesn’t match.

Regarding the first problem, if we look at the image segmentation result above, we can see that the segmented regions doesn’t really cover the whole person. One obvious problematic region is the hair, which is kind of being cut out. Also, although the body region is kind of perfectly segmented, the blended result isn’t as good as expected because the algorithm blends around the mask peripherals, so a tight mask will cause the body edges to fade out too quickly, thus making it looks “ghosty”. To fix this, we try to dilate the mask using OpenCV dilation API³.

Image for post
Image for post
Image for post
Image for post

With mask dilation, the person looks much more solid. Then, we need to tune the color. Again, there are a lot of color tuning algorithms out there which we will not go into details in this article. For this specific historic black-and-white image, it turns out that by simply turning our source image into grey scale has already done the trick. I’m sure there could be a lot of improvement along the pipeline that could be done, but for now, this result looks acceptable to us.

Image for post
Image for post

More Results

To test out more images, we have tried to grab some random images from the Internet (more specifically, we want to try images with group of people). Below are two examples.

Image for post
Image for post

Conclusion

This is a very interesting project for us to try out Mask R-CNN, which is really impressive. The image segmentation result is unbelievably good. The same pipeline could potentially be used in other image processing projects, not only limited to extracting people.

One concern we have though is the processing time. The above demo is built and ran in my macbook. The image segmentation part actually takes me more than 10 seconds to run. (People say it could probably be done around a second with a single GPU powered desktop computer, but we haven’t tried). Our ultimate goal is to have this run on mobile device, so that’s probably the next thing we will dive into!

[1] https://arxiv.org/abs/1703.06870)

[2] https://docs.opencv.org/3.0-beta/modules/photo/doc/cloning.html

[3] https://docs.opencv.org/2.4/doc/tutorials/imgproc/erosion_dilatation/erosion_dilatation.html

SoftMind Engineering and Research

Softmind Engineering and Research Publications

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store