Dimensionality Reduction: Singular Value Decomposition

Antriksh Singh
7 min readMar 27, 2019

--

What is the problem of dimensionality reduction about?

Before moving to the concepts of singular value decomposition let us understand the concept of dimensionality reduction. The idea of dimensionality reduction is that we have some high dimension data such as images or text that are described in a very high dimension way using many many values which we would like to describe in a simpler way. Describing them in a simple way would allow us to compare them, process them or even plot them in a simpler fashion.

Let us assume we have a set of data points, think of these data points as embedded in either a plane or in a three-dimensional space. The points are not just randomly scattered throughout the space but they lie or are embedded in a subset of such planes or 3 dimension spaces.

In order to understand it further let’s consider two scenarios

In the first case let us assume that we have a set of data in a 2-dimensional plane, the data is not randomly scattered through this plane, but it is scattered across subspace of it. In the figure below, the data points are embedded across the line. Representing the data on this 2 dimension plane simply represents wherein the length of the line is a given data point is located.

Similarly, in the second case, data points are embedded in a 3 dimension space and are not randomly scattered. Basically, all data points lie on a single plane that is embedded in this space.

The problem is data points that lie on such sub-planes doesn’t give us any true representation of how data is scattered or embedded across multiple axes. The goal in some sense is to find the axes of such subspace that effectively represent all the data that has been given. This could be achieved by either compressing the dimensionality or by reducing the size of the data representation.

Let us try and understand this with another example, suppose we have given a table with millions of rows and thousands of columns. In such kind of table, every row represents a different data point whereas every column represents a different coordinate or a different dimension.

Our goal here is to take this set of data and build a more compact data or a fewer dimensional representation. Remember, idea is to keep all the rows (data points) and shrink a number of columns while preserving the richness of the dataset.

With dimensionality reduction, we do incorporate or incur a bit of error, as now data points do not exactly lie on the line but are scattered across another axis. The game that we would be playing here is having a smaller data representation while also incurring a little error.

Why Reduce Dimensions?

  • Discover hidden correlations: Situation where we would like to discover latent dimension along which the data varies
  • Remove redundant and noisy features: In order to get rid of noisy features or noisy columns from a large dataset. By removing such redundant features, we reduce the dimension of the overall matrix, at the same time we still preserve most of the features that are highly influential to the overall model.
  • Interpretation and visualization: It becomes easier to plot and visualize the features by reducing high dimensional data to either two or three dimension axis.
  • Easier storage and processing of data: By reducing dimensionality we shrink the data size and it becomes easy to store, analyze and process data.

Data Dimensionality Reduction Technique: Singular Value Decomposition (SVD)

Let us jump on how can we reduce the dimensionality of the matrix (dataset) using the concept of Singular Value Decomposition (SVD). Let us understand the mathematics behind SVD first.

Mathematics Behind SVD

Consider an input data matrix of order m x n i.e. the matrix has m rows and n columns. For instance, if we think of this matrix as a document matrix then every row represents a document, and every column representing a word.

The idea behind SVD is to take this matrix of order m x n and represent it as a product of three matrices, further these three matrices would be having certain constraints associated with it.

  • A: Input data matrix — m x n matrix (eg. m documents, n terms)
  • U: Left Singular Vectors — m x r matrix (m documents, r concepts)
  • Σ: Singular Values — r x r diagonal matrix (strength of each ‘concept’) where r is rank of matrix A
  • V: Right Singular Vectors — n x r matrix (n terms, r concepts)

Note: Singular Value is a diagonal matrix with non zero elements and is sorted in descending order

Till now we have seen how mathematically matrix A can be represented as the product of three matrices, let us now look it graphically

So basically what we do in SVD is we take a big matrix A and represent it as a product of thin and narrow matrix U, a special diagonal matrix Σ, and a very long and thin matrix V

Here is another graphical way of looking at it. In this case, we have taken the product of the left singular vector multiplied it with diagonal values and further multiplied it with the right singular vector

SVD Theorem- Properties

The SVD theorem says that it is always possible to decompose a real matrix A into a product of three matrices U Σ and V. When we say real matrix it means values in matrix A are real numbers and not complex numbers.

The theorem states the following:

  1. U, Σ,V: unique
  2. U, V: Both the matrices are orthonormal in nature. Orthonormal matrices are those whose column’s Euclidean length is 1 or in other terms sum of squared values in each column of U and V matrices is equals to 1. Also, the columns are orthogonal. In simple terms dot product of two columns of U and two columns of V leads to zero
  3. Σ: Diagonal — All the entries (singular values) are positive and are sorted in decreasing order (σ1≥σ2≥….≥0)

An Example

Let us keep the above theorem and concepts of SVD in the back of our minds and understand how these concepts could be helpful in solving a real-life problem.

Let us think about the Users to Movies matrix. Let’s say we are Netflix or a user review website and we have a matrix that consists of users and movies. User-based on their interest would have watched and rated a set of movies that they have seen. A value of 1 depicts they did not like the movie and 5 represents that they really liked the movie. This means every column represents a different movie whereas every row corresponds to a different user.

We would like to decompose matrix A into thinner and smaller matrices U,Σ, and V. Here we can notice we have set users who prefer watching Sci-fi movies and another set of users (bottom three users) who prefer watching non-Sci-fi movies.

Here movies can be broken down into two groups Sci-fi and Non-Sci-fi and also users can be grouped down into Sci-fi lovers and Non-Sci-fi lovers. After applying Singular Value decomposition the resultant matrices can be shown below

Left Singular Matrix (U): Columns of matrix U can be thought of as concepts, the first column of U corresponds to the SciFi concept and the second column of U corresponds to the Romance concept. What it basically shows is first 4 users correspond to scifi concept and the last 3 users correspond to romance concept. Matrix U would be “Users-to-Concept” similarity matrix. Each value in matrix U determines how much a given user corresponds to a given concept (in our case there are two concepts, SciFi and romance concept). In the given matrix for eg, the first user corresponds to SciFi-concept whereas the fifth user corresponds to romance-concept.

Singular Values (Σ): In this diagonal matrix, each diagonal values are non zero positive value. Each value depicts the strength of every concept. For instance, it can be seen “strength” of SciFi concept is higher than that of romance concept.

Right Singular Matrix (V): V is a “movie-to-concept” matrix. For instance, it shows that first three movies heavily belongs to the first concept i.e. SciFi concept while last two belongs to second concept which is romance concept.

We do have third concept but as we can notice the strength of third concept is too poor that we can ignore it and is nothing but the noise in our data.

References

Footnotes

The following blog covers the concepts and advantages of Dimensional Reduction. It also covers the mathematical background, theorem, and different applications of Singular Value Decomposition. Would highly recommend going through references in order to understand them thoroughly.

P.S. — Suggestions are highly solicited

About Me

I am a Data Scientist at CloudCover. Focusing on state-of-art in Data Science, especially in the field of Natural Language Processing and Image Processing. You can reach me at medium, linkedin.

--

--