Computational Linear Algebra: Background Removal with Principle Component Analysis

Robust PCA in action, a worked example of Linear Algebra Techniques

9 min readJun 19, 2023

In the previous article, we learned about the basics of PCA, now we will implement it in a use case for background removal. We’ll take the video from BMC 2012 Background Models Challenge Dataset.

There are other similar datasets available as well, you can select whichever you like, the process will be same.

Human Activity Video Datasets
Background Subtraction Website (a few links on this site are broken/outdated, but many work)

Do the basic imports

import moviepy.editor as mpe
# from IPython.display import display
from glob import glob

import sys, os
import numpy as np
import scipy

%matplotlib inline
import matplotlib.pyplot as plt

Set a Tolerance value:

# MAX_ITERS = 10
TOL = 1.0e-8

With the help of moviepy library, load the video onto the code block, and clip the video to a certain length.

video = mpe.VideoFileClip("data/Video_003.avi")

video.subclip(0,50).ipython_display(width=300)

Check for the current duration of the video

video.duration

113.57

Now, let us create some helper methods, that’ll ease our work in the long run and we’ll call these function to do simpler tasks. The functions are self-explanatory by their names.



def create_data_matrix_from_video(clip, k=5, scale=50):
    return np.vstack([scipy.misc.imresize(rgb2gray(clip.get_frame(i/float(k))).astype(int), 
                      scale).flatten() for i in range(k * int(clip.duration))]).T

def rgb2gray(rgb):
    return np.dot(rgb[...,:3], [0.299, 0.587, 0.114])

def plt_images(M, A, E, index_array, dims, filename=None):
    f = plt.figure(figsize=(15, 10))
    r = len(index_array)
    pics = r * 3
    for k, i in enumerate(index_array):
        for j, mat in enumerate([M, A, E]):
            sp = f.add_subplot(r, 3, 3*k + j + 1)
            sp.axis('Off')
            pixels = mat[:,i]
            if isinstance(pixels, scipy.sparse.csr_matrix):
                pixels = pixels.todense()
            plt.imshow(np.reshape(pixels, dims), cmap='gray')
    return f



def plots(ims, dims, figsize=(15,20), rows=1, interp=False, titles=None):
    if type(ims[0]) is np.ndarray:
        ims = np.array(ims)
    f = plt.figure(figsize=figsize)
    for i in range(len(ims)):
        sp = f.add_subplot(rows, len(ims)//rows, i+1)
        sp.axis('Off')
        plt.imshow(np.reshape(ims[i], dims), cmap="gray")

Load and view the data

An image from 1 moment in time is 60 pixels by 80 pixels (when scaled). We can unroll that picture into a single tall column. So instead of having a 2D picture that is 60 * 80 , we have a 1*4800 column.

This isn’t very human-readable, but it’s handy because it lets us stack the images from different times on top of one another, to put a video all into 1 matrix. If we took the video image every tenth of a second for 113 seconds (so 11,300 different images, each from a different point in time), we’d have a 11300 * 4800 matrix, representing the video!

scale = 25   # Adjust scale to change resolution of image
dims = (int(240 * (scale/100)), int(320 * (scale/100)))

M = create_data_matrix_from_video(video, 100, scale)
# M = np.load("high_res_surveillance_matrix.npy")

print(dims, M.shape)

(60, 80) (4800, 11300)

plt.imshow(np.reshape(M[:,140], dims), cmap='gray');

Since create_data_from_matrix is somewhat slow, we will save our matrix. In general, whenever you have slow pre-processing steps, it's a good idea to save the results for future use. This step will really come in handy when you are dealing with such big matrices.

np.save("low_res_surveillance_matrix.npy", M)

Note: High-res M is too big to plot, so only run the below with the low-res version

plt.figure(figsize=(12, 12))
plt.imshow(M, cmap='gray')

See those wavy black lines, and the horizontal lines.


plt.imsave(fname="image1.jpg", arr=np.reshape(M[:,140], dims), cmap='gray')

A first attempt at SVD

from sklearn import decomposition
u, s, v = decomposition.randomized_svd(M, 2)
u.shape, s.shape, v.shape

((4800, 2), (2,), (2, 11300))


low_rank = u @ np.diag(s) @ v
low_rank.shape

(4800, 11300)

plt.figure(figsize=(12, 12))
plt.imshow(low_rank, cmap='gray')

The below images were created with high-res data. Very slow to process:

plt.imshow(np.reshape(low_rank[:,140], dims), cmap='gray');

plt.imshow(np.reshape(M[:,550] - low_rank[:,550], dims), cmap='gray');

Rank 1 Approximation

u, s, v = decomposition.randomized_svd(M, 1)
u.shape, s.shape, v.shape

((4800, 1), (1,), (1, 11300))


low_rank = u @ np.diag(s) @ v
low_rank.shape

(4800, 11300)

plt.figure(figsize=(12, 12))
plt.imshow(low_rank, cmap='gray')

plt.imshow(np.reshape(M[:,550] - low_rank[:,550], dims), cmap='gray');

plt.imshow(np.reshape(M[:,140] - low_rank[:,140], dims), cmap='gray');

plt.imshow(np.reshape(M[:,140] - low_rank[:,140], dims)[50:150,100:270], cmap='gray');

This is amazing for such a simple approach! We get somewhat better results through a more complicated algorithm below.

plt.imshow(np.reshape(S[:,140], dims)[50:150,100:270], cmap='gray')

Principal Component Analysis (PCA)

When dealing with high-dimensional data sets, we often leverage on the fact that the data has low intrinsic dimensionality in order to alleviate the curse of dimensionality and scale (perhaps it lies in a low-dimensional subspace or lies on a low-dimensional manifold). Principal component analysis is handy for eliminating dimensions.

Classical PCA seeks the best rank-k estimate L of M (minimizing||M-L|| where L has rank-k). Truncated SVD makes this calculation!

Traditional PCA can handle small noise, but is brittle with respect to grossly corrupted observations — even one grossly corrupt observation can significantly mess up answer.

Robust PCA factors a matrix into the sum of two matrices, M = L+S , where is the original matrix L, is low-rank, and is sparse. This is what we’ll be using for the background removal problem! Low-rank means that the matrix has a lot of redundant information — in this case, it’s the background, which is the same in every scene (talk about redundant info!). Sparse means that the matrix has mostly zero entries — in this case, see how the picture of the foreground (the people) is mostly empty. (In the case of corrupted data, S is capturing the corrupted entries).

Applications of Robust PCA

Video Surveillance
Face Reconstruction

Image taken from : http://jeankossaifi.com/

Latent Semantic Indexing: L captures common words used in all documents while S captures the few key words that best distinguish each document from others
Ranking and Collaborative Filtering : a small portion of the available rankings could be noisy and even tampered with

A similar perspective is to look at the contours of the loss function:

Optimization Problem

Robust PCA can be written:

where

Implementing an algorithm from a paper

Source

We will use the general primary component pursuit algorithm from this Robust PCA paper (Candes, Li, Ma, Wright), in the specific form of Matrix Completion via the Inexact ALM Method found in section 3.1 of this paper (Lin, Chen, Ma).

The Good Parts

Section 1 of Candes, Li, Ma, Wright is nicely written, and section 5 Algorithms is our key interest. You don’t need to know the math or understand the proofs to implement an algorithm from a paper. You will need to try different things and comb through resources for useful tidbits. This field has more theoretical researchers and less pragmatic advice.

The algorithm shows up on page 29:

needed definitions of S , the Shrinkage operator, and D, the singular value threshold operator:

Now the code

Robust PCA (via Primary Component Pursuit)

We will use Facebook’s Fast Randomized PCA library.

from scipy import sparse
from sklearn.utils.extmath import randomized_svd
import fbpca

TOL=1e-9
MAX_ITERS=3



def converged(Z, d_norm):
    err = np.linalg.norm(Z, 'fro') / d_norm
    print('error: ', err)
    return err < TOL

def shrink(M, tau):
    S = np.abs(M) - tau
    return np.sign(M) * np.where(S>0, S, 0)


def _svd(M, rank): return fbpca.pca(M, k=min(rank, np.min(M.shape)), raw=True)

def norm_op(M): return _svd(M, 1)[1][0]

def svd_reconstruct(M, rank, min_sv):
    u, s, v = _svd(M, rank)
    s -= min_sv
    nnz = (s > 0).sum()
    return u[:,:nnz] @ np.diag(s[:nnz]) @ v[:nnz], nnz

def pcp(X, maxiter=10, k=10): # refactored
    m, n = X.shape
    trans = m<n
    if trans: X = X.T; m, n = X.shape
        
    lamda = 1/np.sqrt(m)
    op_norm = norm_op(X)
    Y = np.copy(X) / max(op_norm, np.linalg.norm( X, np.inf) / lamda)
    mu = k*1.25/op_norm; mu_bar = mu * 1e7; rho = k * 1.5
    
    d_norm = np.linalg.norm(X, 'fro')
    L = np.zeros_like(X); sv = 1
    
    examples = []
    
    for i in range(maxiter):
        print("rank sv:", sv)
        X2 = X + Y/mu
        
        # update estimate of Sparse Matrix by "shrinking/truncating": original - low-rank
        S = shrink(X2 - L, lamda/mu)
        
        # update estimate of Low-rank Matrix by doing truncated SVD of rank sv & reconstructing.
        # count of singular values > 1/mu is returned as svp
        L, svp = svd_reconstruct(X2 - S, sv, 1/mu)
        
        # If svp < sv, you are already calculating enough singular values.
        # If not, add 20% (in this case 240) to sv
        sv = svp + (1 if svp < sv else round(0.05*n))
        
        # residual
        Z = X - L - S
        Y += mu*Z; mu *= rho
        
        examples.extend([S[140,:], L[140,:]])
        
        if m > mu_bar: m = mu_bar
        if converged(Z, d_norm): break
    
    if trans: L=L.T; S=S.T
    return L, S, examples

which is this algorithm:

Results

m, n = M.shape
round(m * .05)

240

L, S, examples =  pcp(M, maxiter=5, k=10)

rank sv: 1
error: 0.131637937114
rank sv: 241
error: 0.0458515689278
rank sv: 49
error: 0.00591314217762
rank sv: 289
error: 0.000567221885441
rank sv: 529
error: 2.4633786172e-05

plots(examples, dims, rows=5)

f = plt_images(M, S, L, [140], dims)

np.save("high_res_L.npy", L)
np.save("high_res_S.npy", S)


f = plt_images(M, S, L, [0, 100, 1000], dims)

Extracting a bit of the foreground is easier than identifying the background. To accurately get the background, you need to remove all the foreground, not just parts of it.

LU Factorization

Both fbpca and our own randomized_range_finder methods used LU factorization, which factors a matrix into the product of a lower triangular matrix and an upper triangular matrix.

Gaussian Elimination

If you are unfamiliar with Gaussian elimination or need a refresher, watch this Khan Academy video.

Let’s use Gaussian Elimination by hand to review:

Gaussian Elimination transform a linear system into an upper triangular one by applying linear transformations on the left. It is triangular triangularization.

L is unit lower-triangular: all diagonal entries are 1

def LU(A):
    U = np.copy(A)
    m, n = A.shape
    L = np.eye(n)
    for k in range(n-1):
        for j in range(k+1,n):
            L[j,k] = U[j,k]/U[k,k]
            U[j,k:n] -= L[j,k] * U[k,k:n]
    return L, U

A = np.array([[2,1,1,0],[4,3,3,1],[8,7,9,5],[6,7,9,8]]).astype(np.float)

L, U = LU(A)
np.allclose(A, L @ U)

True

The LU factorization is useful!

Solving Ax = b becomes LUx = b:

find A = LU
solve Ly = b
solve Ux = y

Work

Work for Gaussian Elimination: 2.1/3 n³

Conclusion

In this tutorial you learned how SVD and PCA can be used to remove background from an image and where they are used in day-to-day life. In the next article you’ll learn about Compressed Sensing of CT scans and how is it done via Robust Regression. Till then you can visit the complete Linear Algebra Series or maybe look at Computational Physics series.