Singular Value Decomposition: Calculation using EigenValues and EigenVectors in Python
Well, I have gone through lots of articles explaining what PCA is and how to find the principal component for the feature matrix. Nearly all of them talk about the eigenvectors and eigenvalues to see the principal components. Then I have seen the documentation of the python SKlearn
library PCA class that talks about the Singular value decomposition(SVD) to find the principal components. I was really confused about how the SVD and eigenvalues, and eigenvectors correlate with each other.
This article will contain the definition of SVD. And How the SVD problem is transformed into eigenvalues problems and then solved using eigenvalues and eigenvectors. This article is going to be a bit more toward linear algebra but we will also see the pythonic way to find the eigenvalues and eigenvectors and SVD. In the next article, we will focus on PCA and how the covariance matrix is used to get decomposed matrices of SVD.
Singular Value Decomposition(SVD)
Theorem: SVD theorem states that every matrix can be decomposed into a sequence of three elementary transformations: a rotation in input space U, a scaling matrix Σ, and a rotation matrix in output space V.
Where A is of size mxn, U is of size mxm, Σ is of size mxn, and V is of size nxn.
U and V are known as unitary matrices in linear algebraic terms.
Geometric Inference of SVD:
U and V_transpose are unitary, or rotation matrices, so they preserve the angles between the vectors. Here we are taking a circle represented using v1 and v2 vectors. So, first, the eigenvectors v1, and v2 of V (they’re orthogonal) are mapped into two orthogonal basis vectors e1, and e2, then scaled by Σ into orthogonal σ1e1,σ2e2, then mapped by unitary U to σ1u1,σ2u2. The linear map A is fully defined by the transformation of these vectors since every vector can be decomposed by the basis of V, and we know the linear map rules for the basis vectors.
Orthogonal Properties of U and V:
U and V matrices obtained from SVD are always orthogonal. They preserve length, preserve angles, and do not magnify errors. A matrix is said to be orthogonal when its transpose is equal to its inverse.
numpy.linalg
class has methods to calculate the SVD and singular values directly for a given matrix but with slightly different notation which may sound confusing to you.
When you call svd()
function, it returns three matrices: U, Σ_array, and V_transpose , where U and V_transpose are the same matrix as defined in the SVD theorem but Σ is a one-dimensional array containing the singular values or squares of eigenvalues. Therefore when you multiply these returned matrices, you don’t get back the original matrix and you get errors most of the time as these returned matrices differ in order and hence can not be multiplied.
Consider the below example of SVD where we are taking matrix A of size 2x3 and will perform SVD using NumPy library and check the shape of the decomposed matrices obtained:
import numpy as np
A = np.array([[1,4,1],[-4,-7,1]], dtype = float)
U, Σ_array, V_transpose = np.linalg.svd(A)
print("U shape:{}, Σ_array shape:{}, and V_transpose shape:{}".format(U.shape, Σ_array.shape, V_transpose.shape))
print("U is:\n{},\n Σ_array is:\n {},\n and V_transpose is:\n {}".format(U, Σ_array, V_transpose))
As stated, U and V are square and orthogonal matrices.
But when we multiply these returned matrices to get back to matrix A, we get the error:
U@Σ_array@V_transpose # @ is used to perform dot product
The svd()
function of the NumPy library does not directly return the Σ as stated in the theorem, but we can obtain Σ from returned Σ_array. Now, our task is to obtain Σ from Σ_array. As we know Σ is mxn
matrix. The values of Σ_array are the diagonal elements of Σ. Therefore we will first initialize the Σ of size mxn
with all zeroes and then form the diagonal matrix of Σ_array and then assign it to Σ’s top square matrix.
m = A.shape[0]
n = A.shape[1]
Σ = np.zeros((m,n))
Σ[0: len(Σ_array), 0: len(Σ_array)] = np.diag(Σ_array)
print(Σ)
Now we can multiply U, Σ, and V_transpose and we will get back the original matrix A:
print("Matrix constructed from original matrix:\n {}".format(U@Σ@V_transpose))
print("Original Matrix:\n{}".format(A))
We have obtained these decomposed matrices but how do we calculate them?
We know that U and V are orthogonal matrices therefore using their characteristics, the SVD decomposition problem is transformed into an eigenvalue and eigenvector problem. Therefore we can find U and V matrices using eigenvalues and eigenvectors.
In a similar way, we can calculate V:
Calculating U:
# Note: U_ received from eigenvectors of AA^T = (np.dot(A,A.T))
are rearranged in decreasing order of eigenvalues to get U in the SVD equation.
eigen_values_U, eigen_vectors_U = np.linalg.eig(np.dot(A, A.T)) # gives U
# Sorting eigen_values_U and eigen_vectors_U in descending order of eigen_values_U
idx = eigen_values_U.argsort()[::-1]
eigen_values_U = eigen_values_U[idx]
eigen_vectors_U = eigen_vectors_U[:,idx]
print("eigen_values_U received from AA^T:\n{}".format(eigen_values_U))
print("eigen_vectors_U received from AA^T:\n{} which is same as U obtained in SVD except in sign:\n{}".format(eigen_vectors_U, U))
Calculating V: eigen_values_V received from matrixA^TA
is transposed to get V_transpose stated in the SVD equation.
eigen_values_V, eigen_vectors_V = np.linalg.eig(np.dot(A.T, A)) # gives V
# Sorting eigen_values_V and eigen_vectors_V in descending order of eigen_values_U
idx = eigen_values_V.argsort()[::-1]
eigen_values_V = eigen_values_V[idx]
eigen_vectors_V = eigen_vectors_V[:,idx]
print("eigen_values_V received from A^TA:\n{}".format(eigen_values_V))
print("eigen_vectors_V received from A^TA:\n{}".format(eigen_vectors_V))
print("SVD return V_transpose, therefore taking transpose of eigen_vectors_V:\n{} which is same as V_transpose obtained in SVD:\n{}".format(eigen_vectors_V.T, V_transpose))
Calculating Σ:
Σ is a diagonal matrix containing the singular values at the diagonal which is achieved by taking out the square root of common eigenvalues of AA^T
and A^TA
arranged in decreasing order as we get eigenvalues which are squares of Σ therefore we have to take the square roots of eigenvalues received from AA^T and A^TA
:
# eigen_values_U, eigen_values_V are the Σ^2
# taking the min(m,n) and truncate the eigen_values_V or eigne_values_U as both will return same result
Σ_array_cal = np.sqrt(eigen_values_V[:min(m,n)])
print("Σ_array_cal:{}\nis as same as Σ_array received from SVD() function of numpy:{}".format(Σ_array_cal, Σ_array))
You can find the GitHub link here for the code used in the article.
In the next article, we will see how SVD is used in PCA and how we calculate the decomposed matrices of SVD using the covariance matrix. If you liked the article and enjoyed it, then don’t forget to clap, comment, and follow me on medium to stay tuned for future posts.
References: