Matrix Factorization algorithms explained with an example
LU Decomposition, Eigenvalue Decomposition, SVD, Cholesky Decomposition, and QR Decomposition
Matrix Decomposition is an essential concept in mathematics used for numerous purposes. This blog is a walk through major Matrix Factorization algorithms and the maths behind them.
My debut book “LangChain in your Pocket” is out now
Before jumping onto the algorithms, let’s first understand
What is Matrix Factorization?
Mathematically speaking, given a N x M matrix, then using matrix factorization, this N x M matrix can be decomposed into multiple matrics such that on matrix multiplication, we get the same matrix N x M. It finds its applications at numerous fields:
- Collaborative Filtering for Recommendations
- Dimensionality Reduction
- Image Compression and Denoising
- Text Mining and Topic Modeling, etc
We will be talking about the below algorithms in this post
LU Decomposition
Singular Value Decomposition
QR Decomposition
Cholesky Decomposition
Eigenvalue Decomposition
LU Decomposition
LU stands for Lower triangular — Upper triangular Decomposition. Hence, the 2 matrices in which the N x M matrix is decomposed are a Lower and another Upper triangular matrix.
A = L x U
LU Decomposition uses Gaussian Elimination to get to these matrices.
What is Gaussian Elimination/Row Reduction?
It aims at converting a matrix into a Upper Triangular matrix by applying Linear row operations like
Row2= Row2–3 x Row3
Row1 = Row1 + Row2/5
Do remember these operations shouldn’t include non-linear transformation like square-root or power()
The process is simple
Let A be the matrix to be factorized
A = | 3 1 2 |
| 6 3 4 |
| 3 1 5 |
Initialize two matrix L= Identity matrix and U=copy of Actual matrix to be decomposed
L = | 1 0 0 |
| 0 1 0 |
| 0 0 1 |
U = | 3 1 2 |
| 6 3 4 |
| 3 1 5 |
Convert U into a Upper triangular matrix using Gaussian Elimination. Save the multipliers used in the transformation in L . Asume we will be appling the below transformations to U
- Row2 = Row2– 2* Row1
Here, multiplier=2 is used to convert U₂₁ to 0 hence L₂₁=2
- Row3 = Row3 —1*Row1
As multiplier=1 is used to convert U₃₁=0, L₃₁=1
Note: As we need to convert U in an Upper Triangular matrix, we need to convert just 3 values to 0 in a 3x3 matrix, U₂₁, U₃₁, U₃₂. Hence we will be getting 3 multipliers which will be used to fill L .
L = | 1 0 0 |
| 2 1 0 |
| 1 0 1 |
U = | 3 1 2 |
| 0 1 0 |
| 0 0 3 |
As U₃₂ is already 0, L₃₂=0. Else we would have performed a transformation using Row2 over Row3. Why Row2? As the 1st element of Row2=0 by now hence any transformation from Row2 to Row3 won’t affect the already zeroed Row3, 1st element. But a transformation from Row1 may affect already zeroed elements
Therefore the final matrices LU are
L = | 1 0 0 |
| 2 1 0 |
| 1 0 1 |
U = | 3 1 2 |
| 0 1 0 |
| 0 0 3 |
You can cross-check the results by multiplying L x U, you will get A
Eigenvalue/Spectral Decomposition
Before moving ahead, we need to know a few baseline concepts
Eigenvectors= For a square matrix A, an eigenvector is a non-zero vector that, when multiplied by the matrix A, only changes in scale (magnitude) but retains its direction.
Eigen values= The corresponding scale factor associated with eigenvector is called the eigenvalue.
You must have read about Eigenvalues and Eigenvectors while reading about PCA
The relation between eigenvector,eigenvalues and matrix A is represented by below mathematical equation
A * v = λ*v or
A*v —λ*v = 0 or
(A-λI)* v = 0
Where:
A: Square matrix
v: Eigenvector
λ (lambda): EigenvalueI: identity matrix
To perform eigenvalue decomposition, we 1st need to calculate eigenvalues with the below steps given a square matrix A of size n × n.
A = | 4 1 2 |
| 1 5 6 |
| 2 6 7 |
Form the matrix (A — λI), where λ is the unknown eigenvalue and I is the n × n identity matrix.
A - λI = | 4-λ 1 2 |
| 1 5-λ 6 |
| 2 6 7-λ |
Compute the determinant of the matrix (A — λI).This will contain λ in the equation.
Determinant for a 3 x 3 matrix X. This formula changes for matrices depending on the dimension
X = | a b c |
| d e f |
| g h i |
det(A) = a(ei - fh) - b(di - fg) + c(dh - eg)
If we assume determinant to be 0, we have got a equation to solve for λ.
(4-λ)(5-λ)(7-λ)*6*6–1*1*(7-λ)*2*6+2*6*1*2*(5-λ) = 0
Once you solve for λ, keep λ values in (A-λI)* v = 0 to calculate v i.e. eigen vectors
Singular Vector Decomposition
As LU Decomposition decomposes the actual matrix into 2 subsequent matrices, SVD decomposes it into 3 matrices extending the idea of Eigenvalue decomposition. A couple of concepts we must know to understand SVD
Singular values= square_root(eigen values)
Singular vector = Normalized eigen vector
SVD decomposition is represented as:
A = U x Σ x Vᵀ
Where:
A: The original matrix
U: U is the left singular matrixΣ: The singular values matrix (diagonal matrix)
V: Right singular vectors matrix
The steps are simple
Calculate Aᵀ x A. Lets call this A⁻
Calculate eigen values and eigenvectors for A⁻ (as shown above in eigen value decomposition)
Calculate singular values for A⁻ using root of eigen values. These values will be used to form Σ (diagonal matrix)
Calculate singular vectors for A⁻ by normalizing calculated eigen vectors.
Now,
V = matrix of singular vectors of A⁻
Σ = diagonal matrix with distinct singular values
To calculate U, we will use the relation A = U x Σ x Vᵀ
As A, Σ, Vᵀ are known, U can be calculated by U = A x V x Σᵀ.
This a really nice example for referencing how SVD is calculated with an example
QR Decomposition
Similar to LU Decomposition, QR decomposition decomposes a matrix into 2 matrices, Q & R where
Q= Orthonormal matrix
R= Upper Triangular matrix
What is an Orthonormal matrix?
An orthogonal matrix is a matrix in which the column vectors are mutually perpendicular (orthogonal) to each other, and each column vector has a length of 1 (unit vector). Consequently, the rows of an orthogonal matrix are also mutually perpendicular and have a size of 1.
Time for QR Decomposition mathematics
for a matrix A of size m x n,
A = | 1 4 7 |
| 2 5 8 |
| 3 6 9 |
- Initialize matrices Q and R.
Initialize an m x n matrix Q with zeros.
Initialize an n x n matrix R with zeros.
2. Perform Gram-Schmidt orthogonalization.
Jargon alert !!
Gram-Schmidt orthogonalization is a mathematical process used to transform a set of linearly independent vectors (v1, v2, v3, vn) into a new set of orthogonal (perpendicular) vectors, which means each vector is at a right angle to the others.
How does it work?
The first vector, v₁, remains unchanged since there is no other vector to make it orthogonal to.
For the second vector, v₂, subtract the projection of v₂ onto v₁ from v₂ to make it orthogonal to v₁.
For the third vector, v₃, subtract the projections of v₃ onto v₁ and v₂ from v₃ to make it orthogonal to both v₁ and v₂.
For the fourth vector, v₄, subtract the projections of v₄ onto v₁,v₂, and v₃ from v₄ to make it orthogonal to all three previously transformed vectors.
And likewise for remaining vectors
Back to QR Decomposition,
1st Column for Q (q1)= Normalized 1st Column of A (actual matrix)
= v1/||v1||
v1 = [1, 2, 3] (the first column of A)
Magnitude of v1 = √(1^2 + 2^2 + 3^2) = √(14) ≈ 3.74
First column of Q = v1 / √(14) ≈ [0.27, 0.53, 0.80]
temp1 = q1 . v2 (dot product) where v2=2nd column of A
temp 2= v2 — temp1 x q1
q2 = temp2/||temp2||
v2 = [4, 5, 6] (the second column of A)
Projection of v2 onto the subspace orthogonal to v1:
proj(v2, v1) = v2 - (v2 · q1) * q1
= [4, 5, 6] - ((4*0.27) + (5*0.53) + (6*0.80)) * [0.27, 0.53, 0.80]
≈ [4, 5, 6] - (1.79) * [0.27, 0.53, 0.80]
≈ [4, 5, 6] - [0.48, 0.90, 1.43]
≈ [3.52, 4.10, 4.57]
Magnitude of proj(v2, v1) = √(3.52^2 + 4.10^2 + 4.57^2) ≈ √(49.98) ≈ 7.07
Second column of Q = proj(v2, v1) / √(49.98) ≈ [0.50, 0.58, 0.65]
temp1 = q1 . v3
temp2 = v3— temp1 x q1
temp3 = q2 . v3
temp4 =temp2 — temp3 x q2
q3 = temp4/||temp4||
v3 = [7, 8, 9] (the third column of A)
Projection of v3 onto the subspace orthogonal to v1 and v2:
proj(v3, v1) = v3 - (v3 · q1) * q1 - (v3 · q2) * q2
proj(v3, v1) ≈ [7, 8, 9] - (1.79) * [0.27, 0.53, 0.80] - (10.19) * [0.50, 0.58, 0.65]
proj(v3, v1) ≈ [7, 8, 9] - [0.48, 0.90, 1.43] - [5.10, 4.70, 4.84]
proj(v3, v1) ≈ [7, 8, 9] - [5.58, 5.60, 6.27]
proj(v3, v1) ≈ [1.42, 2.40, 2.73]
Magnitude of proj(v3, v1) = √(1.42^2 + 2.40^2 + 2.73^2) ≈ √(11.16) ≈ 3.34
Third column of Q = proj(v3, v1) / √(11.16) ≈ [0.43, 0.72, 0.82]
Now, R can be calculated using the relation A = QR or QᵀA = R. As both A & Q are known, this is just matrix multiplication.
Cholesky Decomposition
It decomposes the actual matrix such that
A = L x Lᵀ
where L is a Lower triangular matrix. To find such a lower triangular matrix, the below functions are used
L₁₁ = √A₁₁
Lᵢ₁ = Aᵢ₁/L₁₁ (1st column of L)
Lᵢᵢ= √(Aᵢᵢ — Σₚ Lᵢₚ²), p= 1 →i-1
so if L₃₃ = √(A₃₃ — (L²₃₁+L²₃₂))
Lᵢⱼ =1/Lⱼⱼ (Aᵢⱼ — Σₚ(Lᵢₚ * Lⱼₚ), i>j, p →1,i-1
for L₄₃ = 1/L₃₃ (A₄₃ — (L₄₁*L₃₁+L₄₂*L₃₂))
Where
L = Lower triangular matrix
A = Actual matrix
Note: Don’t fill the upper triangle of L. Use these formulae for the lower triangle only
As it is just formula driven method, I am skipping example for now
We have discussed a lot about algorithms, but when to use which algorithm is still a mystery. Let’s decode that quickly
LU Decomposition: Square matrices.
Eigenvalue Decomposition: Square matrices that are diagonalizable.
Cholesky Decomposition: Symmetric and positive definite matrices.
QR Decomposition: Any type of matrix
Singular Value Decomposition (SVD): Any type of matrix
With this, it's a wrap !!