Singular Value Decomposition — Explained with code
What it is intuitively, with numbers to illustrate
How’s SVD relevant to AI and Data Science? Well, in many ways. Its scope ranges from dimensionality reduction (and by extension, denoising and data compression), to collaborative filtering for recommender systems, to image processing.
Let’s focus on the fundamentals here, and consider this Math problem to begin.
James and Damien are divers. At the start of the dive, the air in Damien’s tank is 95% that which James has. At the end of the dive, the amount of air (as measured by pressure) in James’ tank is three quarters that which Damien has. Damien used up 110 bar worth of air, while James used seven-tenths of the original amount of air. What is the remaining amount of air (in bar) within Damien’s tank?
All you need is to form a pair of equations, and solve them simultaneously. Here, we can let the two variables represent the air originally in James’ tank and Damien’s tank, respectively. The solution, by code, is
# x1: Air pressure in James' tank
# x2: Air pressure in Damien's tank
# (Eqn 1) y = 0.95x
# (Eqn 2) (y-110)*3/4 = (1-0.7)x
coefficients = np.array([
[19,-20],
[6,-15]
])
constants = np.array([
[0],
[-1650]
])
np.linalg.solve(coefficients, constants)
We get 200 and 190. Straightforward. Now, what if we want to apply SVD here?
A = np.array([
[19,-20,0],
[6,-15,1650]
])
U, S, VT = np.linalg.svd(A)
print(VT)
Actually, by selecting the last vector in V and normalizing such that its smallest value is 1, we also get the answer!
Here comes the important part. Why is this so?
Back To Basics
To set expectations, the basics I refer to here are undergraduate math basics. I will assume that you remember your high school stuff, such as what a diagonal matrix or identity matrix is, what is the transpose or the inverse of a given matrix, and things like that.
Eigendecomposition
Apart from being viewed as a system of linear equations, matrices can also be viewed as performing transformations. Taking for example the following matrix A.
What happens if we apply A onto a point [-3, 4]?
We get [-2, 5] as a result. Therefore, the blue point can be viewed as being transformed by A to the red point (below).
Through eigendecomposition, we can express A as a product of three matrices. This shouldn’t come as a surprise, given that one single transformation can be expressed as a series of multiple transformations applied sequentially.
The above components can be obtained using
A = np.array([
[2, 1],
[1, 2]
])
eigenvalues, eigenvectors = np.linalg.eig(A)
# To check
# eigenvectors @ np.diag(eigenvalues) @ eigenvectors.T
Q is actually composed of the eigenvectors of A, arranged by the columns, while Λ is a diagonal matrix whose diagonal elements are the eigenvalues of A. What’s the mathematical and physical relevance for these? Read on to find out.
Eigenvalues & Eigenvectors
Eigenvectors are vectors that, when transformed by A itself, change in magnitude but not direction. These eigenvectors can be thought of as the “directions” along which the action of A is purely a scaling (by the corresponding eigenvalue). For example, when the transformation A is applied to the first eigenvector e₁, we get λ₁e₁ where λ₁=3 is the corresponding eigenvalue.
Visually, we have:
The same is true for the other eigenvector e₂, although its corresponding eigenvalue λ₂ is 1, hence after transformation we get e₂ itself.
Each eigenvector has a norm of 1, and is orthogonal to all other eigenvectors (ie. dot product between them leads to 0). Each eigenvector can therefore be treated as a ‘principal component’ of the space, as opposed to using the standard i, j, etc. Furthermore, Q is an orthogonal matrix, and Q multiplied with its transpose leads to the identity matrix.
Let’s go back to our original matrix A, which transformed the point [-3, 4] into [-2, 5].
Using the eigendecomposition performed earlier, we have
What happens here is that the point [-3,4] is being rotated, then scaled, then rotated again, to become [-2,5]. Essentially, that is the transformation caused by A.
Moving on to SVD
Earlier, we worked with A, which is a square matrix as well as is symmetrical. Obviously, we do not always get to work with such matrices. In fact, the simple example at the beginning of this article is one such example where we have a 2-by-3 matrix.
What can be done is to extract the left singular vectors and right singular vectors of that matrix. First, we have to get a symmetrical square matrix, which can be done simply by multiplying the matrix with its transpose.
(Note, A is now used to represent a different matrix. It is neater that way, as compared to using different alphabets each time a new example is introduced. This is not a python script).
Using Python (no data scientist would compute it by hand; what matters is understanding how to get it, and the associated significance), we can get the eigenvectors and eigenvalues of the above symmetric square matrices. Be careful about the use of terminologies here. Your outputs below are not the eigenvectors of A, but that of np.matmul(A, A.T)
or np.matmul(A.T, A)
.
eigenvalues_L, eigenvectors_L = np.linalg.eig(np.matmul(A, A.T))
eigenvalues_R, eigenvectors_R = np.linalg.eig(np.matmul(A.T, A))
Now, compare it against the ‘all-in-one’ step using np.linalg.svd
below and you will notice that we end up with the same results, with the caveat that some columns may differ by a factor of -1, and/or the order may be switched. This is normal, because flipping the direction of an eigenvector doesn’t change its physical significance nor anything about what it means to be an eigenvector. Also, eigenvectors obtained from np.linalg.eig
may not be sorted by their eigenvalues, unlike that by np.linalg.svd
.
U, S, VT = np.linalg.svd(A)
print("U: ", U)
print("V: ", VT.T)
vector = VT[-1]
vector/np.min(vector)
More information can be found at the documentations https://numpy.org/doc/stable/reference/generated/numpy.linalg.svd.html, of course. The documentations by popular libraries such as numpy and pytorch come with good examples, and there really is no reason for anyone who wants to start learning to not read them. There is no shortcut or magic pill.
And there you have it. The components obtained from SVD are actually singular vectors, which are really just the eigenvectors of the associated symmetrical square matrix obtained from multiplying A by its own transpose.
From the last image above, we can see that taking the last row of VT
gives us one of the right singular vectors, namely, v₃. If we were to scale this vector by its minimum, we get [200, 190, 1]. Basically, it leads to the same answer given by np.linalg.solve(coefficients, constants)
(refer to the very first code block in this article). This happens because v₃ is associated with the null space, in which lies the solution to the system of linear equations.
Conclusion
The key takeaway here is that you know how to use np.linalg
to solve a system of linear equations, as well as appreciate that we can decompose an arbitrary matrix using SVD, and the resulting components actually possess interesting mathematical properties.
Disclaimer: All opinions and interpretations are that of the writer, and not of MITB. I declare that I have full rights to use the contents published here, and nothing is plagiarized. I declare that this article is written by me and not with any generative AI tool such as ChatGPT. I declare that no data privacy policy is breached, and that any data associated with the contents here are obtained legitimately to the best of my knowledge. I agree not to make any changes without first seeking the editors’ approval. Any violations may lead to this article being retracted from the publication.