Background switching with python

achang
3 min readJul 13, 2020

--

Video meeting like a boss

Alpha matting is the problem of extracting the foreground from an image. Since COVID, we all have playing using different virtual background when having a video call or meeting on Zoom or Teams. In this tutorial, we will write a program to do this using Deep Learning. The task is to extract a person from a video and put them on a different background. And add some music to it because it is cool.

In movies, people film the scene using a green background so that they can easily distinguish what is person and what is background. Then they substitute the green background with the special effects. But most people don’t have this setup and extracting the person from any background is a challenge.

Traditional way to put effects or different background

Extracting a mask for a person in an image can be achieved with a segmentation network, which is a neural network trained to create masks of various objects in a image. There are several datasets used to train segmentation networks, some are: COCO, OpenImages, etc.

One recent work proposed to use another neural network to improve creation of masks (or alpha mates). Background matting uses a network that takes in: image with person, image without person (only background) and the segmentation mask of the person. Checkout their paper for more in depth details.

Overview diagram of the Background Matting’s process

In our case, we don’t have the image of the background without the person. So we will just run the segmentation network and do some post-processing.

Tutorial: Alpha mating

The segmentation network used here is from Deeplab. We will use OpenCV read an input video and a background video. OpenCV provides many tools to do image processing, so you should definitely checkout their tutorials.

After reading frame we run inference and get the segmentation mask. We only care about the person category, so we ignore the rest. If you want you can fine-tune the model on a dataset with the person category only. Fine-tuning means training a model that is already trained but using a more specific dataset. In general (but not always), this improves the accuracy of the model for that particular new dataset.

Depending on the neural network the mask wont always be perfect, so some post-processing is needed to polish the mask. You can use imwrite to write the alpha into a image file to take a look.

First, we use findContours. It will return the contour objects from the alpha. Given the contour object we can easily find the contour area using contourArea and filter out small blobs in the alpha. We can also fill the contour using fillPoly to remove holes. Checkout this for more details on contours. Then, we use erode, dilate and GaussianBlur, which is basically a filter to smooth the edges.

Now we just need to combine these 3 things:

  • Foreground: original frame
  • Alpha: the mask for the person
  • Background: new background frame

Now that we have the frames, we just need to add some music into it. We write the frame into a file and use moviepy to combine the video with some music.

The entire program may take time depending on how long is the video and what computer you are using.

The full code is here: https://github.com/Andrechang/DL_cookbook/tree/main/moviepy

Congrats

Well done! You have written a virtual background program and learned about openCV, segmentation neural networks and background matting.

Stay tuned for more video, music, anime, meme creating with python code.

Checkout previous tutorials:

--

--