Understanding of Matrix Factorization (MF) and Singular Value Decomposition (SVD)

Published in

Analytics Vidhya

6 min readJan 24, 2020

In this blog, I’m going to give a brief about theoretical aspects of Matrix Factorization like what is the purpose & benefits of decomposition, applications of MF in simple terms to provides clear over view on MF.

Besides this, It also covers step by step derivation of Singular Value Decomposition(SVD) explaining every step plainly.

Matrix Factorization

Matrix decomposition (or) matrix factorization is an approximation of a matrix into a product of matrices. They are used to implement efficient matrix algorithms.

“The general idea is that you are able to multiply all of the resulting ‘factor’ matrices together to obtain the original matrix.”

Ex: 6 = 2 * 3 , where 2,3 are treated as factors of 6 and we can generate 6 again by product of those factors(i.e 2,3).

In similar fashion, A = B.C , A can be expressed as product of two lower dimensional matrices B, C. Here, k=no. of dimensions (hyper parameter).

a) Purpose and Ways to decompose a matrix

There are many different matrix decomposition techniques, each finds use among a particular class of problems.

To be brief, these are two broad classes of decomposition:
a) Some only apply to ‘n x n’ matrices
b) Some apply to more general ‘m x n’ matrices

1)To solve linear equations, of which mostly matrix decomposes into 2 parts.

LU decomposition factorizes a matrix into a Lower triangle and a Upper triangle matrix.
QR decomposition decomposes of a matrix A into a product A = QR of an orthogonal matrix Q and an upper triangular matrix R.
Cholskey decomposition etc.
Non-negative matrix factorization (NMF) decomposes a matrix A into two matrices that have non-negative elements. (A must have non-negative elements too)

2) Eigenvalue-based decomposition, mostly applicable to square matrices. where matrix decomposes into 3 parts(final rotation, scaling, initial rotation).

PCA is a transform that uses eigen decomposition to obtain the transform matrix.
Singular Value Decomposition(SVD) factorizes any matrix with any dimension as 3 parts USV’ .

Many other possible methods are available. please refer here.

b) What are the benefits of decomposing a matrix ?

Matrix factorization which separates a matrix into two other matrices that are typically much easier to solve than the original matrix. This not only makes the problem easy to solve, but it reduces the amount of time required by a computer to calculate the answer.
Matrix decomposition are mostly used to solve linear systems in a easy way or quickly.
Matrix factorization reduces a computers storage space for matrices, instead of storing the large non factorized matrix(A), We can use less storage for its factors(i.e B, C), some times even smaller when rank of matrix is small.

c) Applications

Imputation of missing/incomplete data.
Imaging: Segmentation and Noise Removal.
Text Mining/Topic Modeling.
Recommendations using Collaborative filtering.
Eigen faces.

One of the popular techniques of MF is SVD, which is covered at full length completely in smooth way.

Derivation of Singular Value Decomposition(SVD)

SVD is a factorization of a real (or) complex matrix that generalizes of the eigen decomposition of a square normal matrix to any m x n matrix via an extension of the polar decomposition.

In other words, SVD approximates any dimensional matrix into 3 lower dimensional matrices, preserving the maximum variance by ‘Rotation & Scaling’ in the form of the matrices USV’ having ‘top-k eigen vectors and eigen values’ .

A = U S V’ (Final rotation || Scaling || Initial rotation)

where ,U=left singular valued matrix , S=sigular valued matrix, and

V=right singular valued matrix.

The rotated matrices consists of transformed data points, whose eigen vectors make up singular matrices U or V.
The Scaled matrix have values scaled by a factor

U,V are orthogonal matrices, represents the rotations or reflection of the space which are eigen vectors that are orthonormal to each other.

Eigen vectors mainly aims to preserve the direction of vectors.

S is diagonal matrix, represents the scaling of each value by value ‘σ ‘.

a) How to calculate U,S,V ?

Since U, V are orthogonal matrices, we’ve a property of U.Uᵀ =1 and V.Vᵀ =1 and those U,V matrices contains eigen vectors of A.Aᵀ and Aᵀ.A

The eigen vectors of (Aᵀ.A) make up the columns of V. and the eigen vectors of (A.Aᵀ) make up the columns of U.

Also, the singular values in S are square roots of eigenvalues from A.Aᵀ (or) Aᵀ.A . These ‘σ’ values in S are arranged in descending order.

Projecting all data points of matrix into lower dimensional space, such that these eigen vectors preserves maximum variance in that dimension and each eigen value associates to each eigen vector describes “How much % of variance is explained in that dimension”.

Note:

λ~ σ² (i.e. eigen values are equivalent to square of singular values).
If W is a matrix, then eigen vectors can be calculated by W.x= λ .x (or) W.x= σ².x

where, x= eigen vectors , λ (or) σ² = eigen/singular values.

Let’s start derivation of U and S.

Statement : (A.Aᵀ ).x = λ.x
To find U , start with the eigen vectors ‘x’ of A.Aᵀ

Proof :

b) Finding eigen values (λ):

If x is non-zero, then this characteristic equation (W- λ.I).x = 0 will have solution.

W.x- λ.I.x = 0 …….(‘W’ is nothing but A.Aᵀ matrix).

The determinant of the matrix (W- λ.I) must be equal to zero. Thus from the solution of the characteristic equation, det |W- λ.I|=0

example : W — λ.I

det |W- λ.I| = λ³ -12 λ² + 16 = 0

∴ Finding roots of the above equation , we get the eigenvalues of A.

λ = 4, −2.

Solving this determinant equation, we get eigen values ‘λ’ from which we can derive singular values ’σ’.

c) Finding the eigen vectors (x):

Substituting these λ’s in the equation (W- λ.I).x=0, we get first form of matrices as (W-4.I).x and (W+2I).x
Construct augmented matrix (W+2I) and convert into row-echelon form by performing the row operations. [Gauss Jordan’s elimination technique]