# Foundations of Machine Learning : Singular Value Decomposition (SVD)

Linear Algebra is fundamental in many areas of Machine learning and one of the most important concepts is; Singular Value Decomposition(SVD). The motivation element behind this article is to get Software Engineers to ameliorate their basic understanding of SVD, and its real-world application.

Singular Value Decomposition(SVD) is one of the most widely used Unsupervised learning algorithms, that is at the center of many recommendation and Dimensionality reduction systems that are the core of global companies such as Google, Netflix, Facebook, Youtube, and others.

Specifically, for this article, we shall be looking at a movie recommendation system. But before that, let’s see how SVD works.

In simple terms, SVD is the factorization of a matrix into 3 matrices. So if we have a matrix **A**, then its SVD is represented by:

Where

Ais an m x n matrix,Uis an (m x m) orthogonal matrix, 𝚺 is an (m x n) nonnegative rectangular diagonal matrix, andVis an (n x n) orthogonal matrix.

Uis also referred to as the left singular vectors, 𝚺 the singular values, andVthe right singular vectors

So first let’s see how this comes about and then we’ll look at an example:

Imagine a circle in two dimensions represented by vectors `V1`

and `V2`

undergoing a matrix transformation as illustrated on the cartesian coordinates below:

**After Matrix Multiplication**

From the images above, you can tell that when a matrix multiples a vector, it simply stretches it and then rotates it.

So if we generalize this from just two dimensions to n-dimensions, the vector space:

after the multiplication, and we have:

representing the space of the individual stretching factors.

Therefore from this we can write the equation:

which we can write more generally as:

Where **𝚺 **represents the space of all stretching factors(σ’s).

But for an orthogonal matrix,

Also, note that the product of a matrix and its inverse is the **identity matrix **(An identity matrix is a diagonal matrix with only 1's). This concept can be represented by the equation below:

Combining the above three equations leads us to the **Reduced Singular Value Decomposition**.

Where **V** is a rotation, 𝚺 a stretching and **U** another rotation. Also, the entries of **U **are the principle axis while 𝚺 are the singular values.

So this is how you can decompose a matrix into three lower rank matrices.

Let’s look at a classical application of this. Imagine that we have a matrix A whose columns represent movies and the rows different users. The entries of the matrix are numbers 0 to 5 where 0 means a user does not like a certain movie and 5 means they really like a given movie as illustrated below:

Now imagine that the first 3 columns are the movies Avengers, StarWars and IronMan respectively(Sci-Fi movies). While the last 2 columns are the movies Titanic and Notebook (Romance movies).

After performing SVD on matrix A we get the matrices U𝚺V as illustrated below(using a tool like or sklearn):

**Let’s take a closer look at these three matrices starting with U:**

So the first column of U represents weights that would match each user’s preference to movies categorized under Sci-Fi while the second column of U represents weights that would match each user’s preference to movies under the romance category. For example, the first user greatly prefers sci-fi movies(0.13 score) compared to romance(0.02 score). As for the third column, we won’t consider it for now.

**And for 𝚺,**

The first diagonal entry represents the weight of the Sci-Fi category and the second diagonal entry represents the weight of the romance category.

**And for V,**

The columns depict the degree to which a movie belongs to a category. So, for example, we can see from the first column of **V **that the first movie(this would be Avengers) belongs heavily to the Sci-Fi(0.56 score) category and very little to the romance category(0.12 score).

Note:we have not considered the third dimension of each matrix at all. Well, this is because when you look at matrix 𝚺, the third diagonal entry which represents the weight of a movie category has a small value(1.3 score). This is understandable because we only have two categories of movies. So most of the third dimension is considered as noise.

So its the above note that we use to perform dimensionality reduction to the matrices **A**. We do this by eliminating the third dimension of 𝚺, this would also mean eliminating the third column of **U** and the third row of **V** to produce the following new **U, 𝚺 **and,** V**:

So as you can see, the final matrices U𝚺V are smaller than the initial ones since we have eliminated the third dimension.

To confirm that eliminating the given rows and columns as we have done only affects the initial matrix A to a small extent. Let’s multiply the above three matrices to get matrix B below:

So Let’s compare this matrix B with the original matrix **A** below:

Just by looking at the above two matrices, you can tell that the difference between their elements is very small, in other words, the product of our final three matrices **B**(after SVD)** **≈ **A**(before SVD):

Or mathematically this can be represented as the Frobenius norm. Which is the square root of the summation of the squares of the differences between the individual matrix entries. It can be represented by the equation below:

So this is how we are able to decompose a matrix into lower rank matrices without losing much of the important data.

It also helps to analyse and acquire important information concerning the matrix data.

There are many other applications of SVD other than the ones talked about in this article. Some of the others include data compression, solving the pseudo-inverse and search engines like Google use SVD to compute approximations of enormous matrices that provide compression ratios of millions to one. So searching for a term is much quicker.

So hopefully this reading can give you a clear picture of this fundamental linear algebra concept and its application in Machine Learning.