Matrix Factorization algorithms explained with an example

LU Decomposition, Eigenvalue Decomposition, SVD, Cholesky Decomposition, and QR Decomposition

Mehul Gupta
Data Science in your pocket

--

Matrix Decomposition is an essential concept in mathematics used for numerous purposes. This blog is a walk through major Matrix Factorization algorithms and the maths behind them.

My debut book “LangChain in your Pocket” is out now

Before jumping onto the algorithms, let’s first understand

What is Matrix Factorization?

Mathematically speaking, given a N x M matrix, then using matrix factorization, this N x M matrix can be decomposed into multiple matrics such that on matrix multiplication, we get the same matrix N x M. It finds its applications at numerous fields:

  • Collaborative Filtering for Recommendations
  • Dimensionality Reduction
  • Image Compression and Denoising
  • Text Mining and Topic Modeling, etc

We will be talking about the below algorithms in this post

LU Decomposition

Singular Value Decomposition

QR Decomposition

Cholesky Decomposition

Eigenvalue Decomposition

LU Decomposition

LU stands for Lower triangular — Upper triangular Decomposition. Hence, the 2 matrices in which the N x M matrix is decomposed are a Lower and another Upper triangular matrix.

A = L x U

example for Lower-Upper Triangular matrix

LU Decomposition uses Gaussian Elimination to get to these matrices.

What is Gaussian Elimination/Row Reduction?

It aims at converting a matrix into a Upper Triangular matrix by applying Linear row operations like

Row2= Row2–3 x Row3

Row1 = Row1 + Row2/5

Do remember these operations shouldn’t include non-linear transformation like square-root or power()

The process is simple

Let A be the matrix to be factorized

A = | 3  1  2 |
| 6 3 4 |
| 3 1 5 |

Initialize two matrix L= Identity matrix and U=copy of Actual matrix to be decomposed

L = | 1   0   0 |
| 0 1 0 |
| 0 0 1 |

U = | 3 1 2 |
| 6 3 4 |
| 3 1 5 |

Convert U into a Upper triangular matrix using Gaussian Elimination. Save the multipliers used in the transformation in L . Asume we will be appling the below transformations to U

  • Row2 = Row2– 2* Row1

Here, multiplier=2 is used to convert U₂₁ to 0 hence L₂₁=2

  • Row3 = Row3 —1*Row1

As multiplier=1 is used to convert U₃₁=0, L₃₁=1

Note: As we need to convert U in an Upper Triangular matrix, we need to convert just 3 values to 0 in a 3x3 matrix, U₂₁, U₃₁, U₃₂. Hence we will be getting 3 multipliers which will be used to fill L .

L = | 1   0   0 |
| 2 1 0 |
| 1 0 1 |

U = | 3 1 2 |
| 0 1 0 |
| 0 0 3 |

As U₃₂ is already 0, L₃₂=0. Else we would have performed a transformation using Row2 over Row3. Why Row2? As the 1st element of Row2=0 by now hence any transformation from Row2 to Row3 won’t affect the already zeroed Row3, 1st element. But a transformation from Row1 may affect already zeroed elements

Therefore the final matrices LU are

L = | 1   0   0 |
| 2 1 0 |
| 1 0 1 |

U = | 3 1 2 |
| 0 1 0 |
| 0 0 3 |

You can cross-check the results by multiplying L x U, you will get A

Eigenvalue/Spectral Decomposition

Before moving ahead, we need to know a few baseline concepts

Eigenvectors= For a square matrix A, an eigenvector is a non-zero vector that, when multiplied by the matrix A, only changes in scale (magnitude) but retains its direction.

Eigen values= The corresponding scale factor associated with eigenvector is called the eigenvalue.

You must have read about Eigenvalues and Eigenvectors while reading about PCA

The relation between eigenvector,eigenvalues and matrix A is represented by below mathematical equation

A * v = λ*v or

A*v —λ*v = 0 or

(A-λI)* v = 0

Where:
A: Square matrix
v: Eigenvector
λ (lambda): Eigenvalue

I: identity matrix

To perform eigenvalue decomposition, we 1st need to calculate eigenvalues with the below steps given a square matrix A of size n × n.

A = | 4  1  2 |
| 1 5 6 |
| 2 6 7 |

Form the matrix (A — λI), where λ is the unknown eigenvalue and I is the n × n identity matrix.

A - λI = | 4-λ  1  2 |
| 1 5-λ 6 |
| 2 6 7-λ |

Compute the determinant of the matrix (A — λI).This will contain λ in the equation.

Determinant for a 3 x 3 matrix X. This formula changes for matrices depending on the dimension

X = | a  b  c |
| d e f |
| g h i |

det(A) = a(ei - fh) - b(di - fg) + c(dh - eg)

If we assume determinant to be 0, we have got a equation to solve for λ.

(4-λ)(5-λ)(7-λ)*6*6–1*1*(7-λ)*2*6+2*6*1*2*(5-λ) = 0

Once you solve for λ, keep λ values in (A-λI)* v = 0 to calculate v i.e. eigen vectors

Singular Vector Decomposition

As LU Decomposition decomposes the actual matrix into 2 subsequent matrices, SVD decomposes it into 3 matrices extending the idea of Eigenvalue decomposition. A couple of concepts we must know to understand SVD

Singular values= square_root(eigen values)

Singular vector = Normalized eigen vector

SVD decomposition is represented as:

A = U x Σ x Vᵀ

Where:
A: The original matrix
U: U is the left singular matrix

Σ: The singular values matrix (diagonal matrix)
V: Right singular vectors matrix

The steps are simple

Calculate Aᵀ x A. Lets call this A⁻

Calculate eigen values and eigenvectors for A⁻ (as shown above in eigen value decomposition)

Calculate singular values for A⁻ using root of eigen values. These values will be used to form Σ (diagonal matrix)

Calculate singular vectors for A⁻ by normalizing calculated eigen vectors.

Now,

V = matrix of singular vectors of A⁻

Σ = diagonal matrix with distinct singular values

To calculate U, we will use the relation A = U x Σ x Vᵀ

As A, Σ, Vᵀ are known, U can be calculated by U = A x V x Σᵀ.

This a really nice example for referencing how SVD is calculated with an example

QR Decomposition

Similar to LU Decomposition, QR decomposition decomposes a matrix into 2 matrices, Q & R where

Q= Orthonormal matrix

R= Upper Triangular matrix

What is an Orthonormal matrix?

An orthogonal matrix is a matrix in which the column vectors are mutually perpendicular (orthogonal) to each other, and each column vector has a length of 1 (unit vector). Consequently, the rows of an orthogonal matrix are also mutually perpendicular and have a size of 1.

Time for QR Decomposition mathematics

for a matrix A of size m x n,

A = | 1   4   7 |
| 2 5 8 |
| 3 6 9 |
  1. Initialize matrices Q and R.

Initialize an m x n matrix Q with zeros.

Initialize an n x n matrix R with zeros.

2. Perform Gram-Schmidt orthogonalization.

Jargon alert !!

Gram-Schmidt orthogonalization is a mathematical process used to transform a set of linearly independent vectors (v1, v2, v3, vn) into a new set of orthogonal (perpendicular) vectors, which means each vector is at a right angle to the others.

How does it work?

The first vector, v₁, remains unchanged since there is no other vector to make it orthogonal to.

For the second vector, v₂, subtract the projection of v₂ onto v₁ from v₂ to make it orthogonal to v₁.

For the third vector, v₃, subtract the projections of v₃ onto v₁ and v₂ from v₃ to make it orthogonal to both v₁ and v₂.

For the fourth vector, v₄, subtract the projections of v₄ onto v₁,v₂, and v₃ from v₄ to make it orthogonal to all three previously transformed vectors.

And likewise for remaining vectors

Back to QR Decomposition,

1st Column for Q (q1)= Normalized 1st Column of A (actual matrix)

= v1/||v1||

v1 = [1, 2, 3] (the first column of A)

Magnitude of v1 = √(1^2 + 2^2 + 3^2) = √(14) ≈ 3.74

First column of Q = v1 / √(14) ≈ [0.27, 0.53, 0.80]

temp1 = q1 . v2 (dot product) where v2=2nd column of A

temp 2= v2 — temp1 x q1

q2 = temp2/||temp2||

v2 = [4, 5, 6] (the second column of A)

Projection of v2 onto the subspace orthogonal to v1:
proj(v2, v1) = v2 - (v2 · q1) * q1
= [4, 5, 6] - ((4*0.27) + (5*0.53) + (6*0.80)) * [0.27, 0.53, 0.80]
≈ [4, 5, 6] - (1.79) * [0.27, 0.53, 0.80]
≈ [4, 5, 6] - [0.48, 0.90, 1.43]
≈ [3.52, 4.10, 4.57]

Magnitude of proj(v2, v1) = √(3.52^2 + 4.10^2 + 4.57^2) ≈ √(49.98) ≈ 7.07

Second column of Q = proj(v2, v1) / √(49.98) ≈ [0.50, 0.58, 0.65]

temp1 = q1 . v3

temp2 = v3— temp1 x q1

temp3 = q2 . v3

temp4 =temp2 — temp3 x q2

q3 = temp4/||temp4||

v3 = [7, 8, 9] (the third column of A)

Projection of v3 onto the subspace orthogonal to v1 and v2:
proj(v3, v1) = v3 - (v3 · q1) * q1 - (v3 · q2) * q2
proj(v3, v1) ≈ [7, 8, 9] - (1.79) * [0.27, 0.53, 0.80] - (10.19) * [0.50, 0.58, 0.65]
proj(v3, v1) ≈ [7, 8, 9] - [0.48, 0.90, 1.43] - [5.10, 4.70, 4.84]
proj(v3, v1) ≈ [7, 8, 9] - [5.58, 5.60, 6.27]
proj(v3, v1) ≈ [1.42, 2.40, 2.73]

Magnitude of proj(v3, v1) = √(1.42^2 + 2.40^2 + 2.73^2) ≈ √(11.16) ≈ 3.34

Third column of Q = proj(v3, v1) / √(11.16) ≈ [0.43, 0.72, 0.82]

Now, R can be calculated using the relation A = QR or QᵀA = R. As both A & Q are known, this is just matrix multiplication.

Cholesky Decomposition

It decomposes the actual matrix such that

A = L x Lᵀ

where L is a Lower triangular matrix. To find such a lower triangular matrix, the below functions are used

L₁₁ = √A₁₁

Lᵢ₁ = Aᵢ₁/L₁₁ (1st column of L)

Lᵢᵢ= √(Aᵢᵢ — Σₚ Lᵢₚ²), p= 1 →i-1

so if L₃₃ = √(A₃₃ — (L²₃₁+L²₃₂))

Lᵢⱼ =1/Lⱼⱼ (Aᵢⱼ — Σₚ(Lᵢₚ * Lⱼₚ), i>j, p →1,i-1

for L₄₃ = 1/L₃₃ (A₄₃ — (L₄₁*L₃₁+L₄₂*L₃₂))

Where

L = Lower triangular matrix

A = Actual matrix

Note: Don’t fill the upper triangle of L. Use these formulae for the lower triangle only

As it is just formula driven method, I am skipping example for now

We have discussed a lot about algorithms, but when to use which algorithm is still a mystery. Let’s decode that quickly

LU Decomposition: Square matrices.

Eigenvalue Decomposition: Square matrices that are diagonalizable.

Cholesky Decomposition: Symmetric and positive definite matrices.

QR Decomposition: Any type of matrix

Singular Value Decomposition (SVD): Any type of matrix

With this, it's a wrap !!

--

--