Singular Value Decomposition — Explained with code

What it is intuitively, with numbers to illustrate

James Koh, PhD
MITB For All
7 min readFeb 6, 2024

--

How’s SVD relevant to AI and Data Science? Well, in many ways. Its scope ranges from dimensionality reduction (and by extension, denoising and data compression), to collaborative filtering for recommender systems, to image processing.

Let’s focus on the fundamentals here, and consider this Math problem to begin.

James and Damien are divers. At the start of the dive, the air in Damien’s tank is 95% that which James has. At the end of the dive, the amount of air (as measured by pressure) in James’ tank is three quarters that which Damien has. Damien used up 110 bar worth of air, while James used seven-tenths of the original amount of air. What is the remaining amount of air (in bar) within Damien’s tank?

All you need is to form a pair of equations, and solve them simultaneously. Here, we can let the two variables represent the air originally in James’ tank and Damien’s tank, respectively. The solution, by code, is

# x1: Air pressure in James' tank
# x2: Air pressure in Damien's tank
# (Eqn 1) y = 0.95x
# (Eqn 2) (y-110)*3/4 = (1-0.7)x

coefficients = np.array([
[19,-20],
[6,-15]
])
constants = np.array([
[0],
[-1650]
])
np.linalg.solve(coefficients, constants)

We get 200 and 190. Straightforward. Now, what if we want to apply SVD here?

A = np.array([
[19,-20,0],
[6,-15,1650]
])
U, S, VT = np.linalg.svd(A)
print(VT)

Actually, by selecting the last vector in V and normalizing such that its smallest value is 1, we also get the answer!

Here comes the important part. Why is this so?

Back To Basics

To set expectations, the basics I refer to here are undergraduate math basics. I will assume that you remember your high school stuff, such as what a diagonal matrix or identity matrix is, what is the transpose or the inverse of a given matrix, and things like that.

Eigendecomposition

Apart from being viewed as a system of linear equations, matrices can also be viewed as performing transformations. Taking for example the following matrix A.

A 2-by-2 matrix. Image by author. (Unless otherwise specified, all equations and images here are by the author)

What happens if we apply A onto a point [-3, 4]?

[-3, 4] becomes transformed to [-2, 5].

We get [-2, 5] as a result. Therefore, the blue point can be viewed as being transformed by A to the red point (below).

Plotted using Matplotlib.

Through eigendecomposition, we can express A as a product of three matrices. This shouldn’t come as a surprise, given that one single transformation can be expressed as a series of multiple transformations applied sequentially.

Decomposition of A.

The above components can be obtained using

A = np.array([
[2, 1],
[1, 2]
])
eigenvalues, eigenvectors = np.linalg.eig(A)

# To check
# eigenvectors @ np.diag(eigenvalues) @ eigenvectors.T

Q is actually composed of the eigenvectors of A, arranged by the columns, while Λ is a diagonal matrix whose diagonal elements are the eigenvalues of A. What’s the mathematical and physical relevance for these? Read on to find out.

Eigenvalues & Eigenvectors

Eigenvectors are vectors that, when transformed by A itself, change in magnitude but not direction. These eigenvectors can be thought of as the “directions” along which the action of A is purely a scaling (by the corresponding eigenvalue). For example, when the transformation A is applied to the first eigenvector e₁, we get λ₁e₁ where λ₁=3 is the corresponding eigenvalue.

Transformation of e₁ leading to λ₁e₁.

Visually, we have:

Transformation of e₁ leading to λ₁e₁.

The same is true for the other eigenvector e₂, although its corresponding eigenvalue λ₂ is 1, hence after transformation we get e₂ itself.

Each eigenvector has a norm of 1, and is orthogonal to all other eigenvectors (ie. dot product between them leads to 0). Each eigenvector can therefore be treated as a ‘principal component’ of the space, as opposed to using the standard i, j, etc. Furthermore, Q is an orthogonal matrix, and Q multiplied with its transpose leads to the identity matrix.

Let’s go back to our original matrix A, which transformed the point [-3, 4] into [-2, 5].

Equation from above replicated here for your convenience.

Using the eigendecomposition performed earlier, we have

Replacing A with its components from eigendecomposition.

What happens here is that the point [-3,4] is being rotated, then scaled, then rotated again, to become [-2,5]. Essentially, that is the transformation caused by A.

Moving on to SVD

Earlier, we worked with A, which is a square matrix as well as is symmetrical. Obviously, we do not always get to work with such matrices. In fact, the simple example at the beginning of this article is one such example where we have a 2-by-3 matrix.

What can be done is to extract the left singular vectors and right singular vectors of that matrix. First, we have to get a symmetrical square matrix, which can be done simply by multiplying the matrix with its transpose.

Symmetrical square matrix obtained by multiplying with the transpose.

(Note, A is now used to represent a different matrix. It is neater that way, as compared to using different alphabets each time a new example is introduced. This is not a python script).

Using Python (no data scientist would compute it by hand; what matters is understanding how to get it, and the associated significance), we can get the eigenvectors and eigenvalues of the above symmetric square matrices. Be careful about the use of terminologies here. Your outputs below are not the eigenvectors of A, but that of np.matmul(A, A.T) or np.matmul(A.T, A).

eigenvalues_L, eigenvectors_L = np.linalg.eig(np.matmul(A, A.T))
eigenvalues_R, eigenvectors_R = np.linalg.eig(np.matmul(A.T, A))

Now, compare it against the ‘all-in-one’ step using np.linalg.svd below and you will notice that we end up with the same results, with the caveat that some columns may differ by a factor of -1, and/or the order may be switched. This is normal, because flipping the direction of an eigenvector doesn’t change its physical significance nor anything about what it means to be an eigenvector. Also, eigenvectors obtained from np.linalg.eig may not be sorted by their eigenvalues, unlike that by np.linalg.svd.

Components obtained from Singular Value Decomposition of a rectangular matrix.
U, S, VT = np.linalg.svd(A)
print("U: ", U)
print("V: ", VT.T)

vector = VT[-1]
vector/np.min(vector)

More information can be found at the documentations https://numpy.org/doc/stable/reference/generated/numpy.linalg.svd.html, of course. The documentations by popular libraries such as numpy and pytorch come with good examples, and there really is no reason for anyone who wants to start learning to not read them. There is no shortcut or magic pill.

And there you have it. The components obtained from SVD are actually singular vectors, which are really just the eigenvectors of the associated symmetrical square matrix obtained from multiplying A by its own transpose.

From the last image above, we can see that taking the last row of VT gives us one of the right singular vectors, namely, v₃. If we were to scale this vector by its minimum, we get [200, 190, 1]. Basically, it leads to the same answer given by np.linalg.solve(coefficients, constants)(refer to the very first code block in this article). This happens because v₃ is associated with the null space, in which lies the solution to the system of linear equations.

Conclusion

The key takeaway here is that you know how to use np.linalg to solve a system of linear equations, as well as appreciate that we can decompose an arbitrary matrix using SVD, and the resulting components actually possess interesting mathematical properties.

Disclaimer: All opinions and interpretations are that of the writer, and not of MITB. I declare that I have full rights to use the contents published here, and nothing is plagiarized. I declare that this article is written by me and not with any generative AI tool such as ChatGPT. I declare that no data privacy policy is breached, and that any data associated with the contents here are obtained legitimately to the best of my knowledge. I agree not to make any changes without first seeking the editors’ approval. Any violations may lead to this article being retracted from the publication.

--

--

James Koh, PhD
MITB For All

Data Science Instructor - teaching Masters students