Effortlessly Remove Unwanted Video Backgrounds with SVD

How to utilize Python to effortlessly eliminate the background in a video?

ShengKai Chen
The Power of AI
5 min readJan 31, 2023

--

Interesting how to build this ML model in a well prepare programming environment? Click here to build this model step-by-step with CognitiveClass.ai Guided Project

As the popularity of short-form videos booms, more and more people are using the technic of video inpainting to edit their videos. The Singular-Value Decomposition (SVD) is one of the most efficient ways to remove pedestrians in a frame and return a clean background frame as a result. Moreover, SVD can also support data scientists in lessening the dataset’s complexity before starting any training by reducing the dimension of the dataset to handle complex data analysis efficiently.

After reading this guided project, you will understand the foundation of singular-value decomposition and how to implement these technics to edit frames in a video.

Singular Value Decomposition

SVD will decompose a real or complex N × D matrix X of rank r as follows:

In many applications N ≥ D, but SVD can be used for any matrix X. For example, in computer vision and image processing tasks we sometimes have D ≥ N.

The matrix U is N × D and has the orthonormal columns called the Left Singular Vectors:

The matrix S contains the nonnegative Singular Values of X, with diagonal entries σᵢ else 0, the entries are ordered by importance in descending order with respect to i, i.e:

The matrix V is D × D and has the orthonormal columns called the Right Singular Vectors:

(note that V transpose, Vᵗ is returned as output in numpy’s svd function).

SVD decomposition returns the full shape of a non-square matrix, the non-colored parts of the decomposition N-D terms in the matrix U are zeros, and we see many of the squares are redundant.

Consider the matrix X:

import numpy as np
from sympy import Matrix
X = np.array([[1.0, 2], [2, 1], [3, 3]])

Matrix(X)

We can perform SVD on any matrix in numpy by using the function svd from numpy.linalg:

from numpy.linalg import svd
U, s, VT = svd(X, full_matrices=False)

When X is a 2D array, it is factorized as U × np.diag(s) × Vᵗ, where U and Vᵗ are 2D orthogonal matrices, s is a 1D array of X’s singular values, and np.diag() is a matrix converter.

When X is higher-dimensional (such as sparse matrices), the parameter full_matrices=False, the skinny SVD is used and the zero elements are dropped. This can be summarized in the following image:

We have the Left Singular Vectors of X:

Matrix(U)

We have the Singular Values of X, as the output is an 1-D array we use the function np.diag to convert the output into a diagonal matrix:

S = np.diag(s)

Matrix(S)

Finally, we have the Right Singular Vectors of X, as the output is transposed they are the rows of matrix VT:

Matrix(VT)

Now it’s time for the magic. Let us reconstruct the matrix X with the following code:

X_ = U@S@VT
X_ = np.round(X_)

Matrix(X_)

It may be more intuitive if you think of SVD reconstructing the matrix as a linear combination of r rank-1 matrices and the associated singular values (r is the rank of X):

X_2 = s[0] * U[:, 0:1]@VT[0:1, :] + s[1] * U[:, 1:2]@VT[1:2, :]

Matrix(X_2)

The same matrix is returned!

So that is basically the technic of how not losing the background during the background subtraction process.

Background Model using SVD

Background subtraction is a widely used approach to detect moving objects in a sequence of frames from static cameras. The base in this approach is detecting moving objects from the differences between the current frame and reference frame, which is often called “Background Image” or “Background Model”.

The function get_data_Matrix will create a Design matrix X where each row corresponds to a flattened image of a road with cars recorded by a camera.

X, Length, Width = get_data_Matrix(mypath="traffic")
print("There are %d frames in the dataset and each is %d X %d (%d pixels)"%(X.shape[0], Length, Width, X.shape[1]))

Let’s pick a random frame to test our Background Model.

r = randint(0, X.shape[0]-1)
frame_X = X[r, :]
plt.imshow(frame_X.reshape(Length,Width),cmap="gray")
plt.title("frame: " + str(r))
plt.show()

After we save the value of rectangular dimensions in each frame into Length and Width, we can perform SVD to remove the pedestrians on a random frame:

U, s, VT = svd(frame_X, full_matrices=False)
S = np.diag(s)

We can reconstruct the image using Truncated SVD with L=1 and assign the result to to Xhat:

L = 1
Xhat = U[:, :L]@S[0:L, 0:L]@VT[:L, :]

If we plot the first image, we will see all the cars are gone:

plt.imshow(Xhat[0,:].reshape(Length,Width),cmap="gray")
plt.title('Truncated SVD L=1')
plt.show()

That’s it!

However, don’t stop reading now! This hands-on only introduces partial secrets of the SVD. Uncover more details in this in-depth and Free CognitiveClass.ai Guided Project. Let’s upgrade your knowledge today.

Feel free to connect with me on LinkedIn as well!

--

--

ShengKai Chen
The Power of AI

Shengkai is a data scientist at IBM with experience in analyzing data for retail stores. He is enrolled at the University of Toronto’s Faculty of Information.