Understanding SVD with Topic Modelling

6 min readFeb 14, 2024

What is SVD?
SVD — Singular Value Decomposition is a technique in linear algebra that is used to decompose or break down a complicated/big matrix into 3 simpler matrices.

In essence, it is a data reduction or dimensionality reduction tool. So for example, if you have very high dimensional data maybe an image with a lot of pixels, SVD is gonna help us reduce this data into its key features that are necessary for analysing or understanding this data.

Imagine you have a group of data points in 2D or 3D space, and you want to understand their main directions and strengths. SVD helps in doing this.

Applications

1. Dimensionality Reduction

Use Case: Reducing the number of features in a dataset while retaining most of the information.
Example: In image processing, SVD can be used to compress images by retaining only the most significant singular values and corresponding singular vectors. This approach, known as low-rank approximation, significantly reduces the size of the image data without substantial loss of quality.

2. Recommender Systems

Use Case: Building systems that recommend items (like movies, books, products) to users.
Example: Netflix or Amazon’s recommendation engines. SVD can be used to decompose a user-item rating matrix, uncovering latent features that represent underlying user preferences and item characteristics. These features can then be used to predict missing ratings and recommend items to users.

3. Natural Language Processing (NLP)

Use Case: Identifying patterns in text data, such as topic modeling.
Example: In topic modeling for a set of documents, SVD can decompose the term-document matrix to identify latent topics. Each topic is a combination of terms, and each document is a mixture of these topics, helping in categorizing and summarizing large collections of text data.

4. Signal Processing

Use Case: Noise reduction and signal filtering.
Example: In audio signal processing, SVD can be used to separate noise from the actual signal. By decomposing the signal matrix and discarding the components with low singular values (which often represent noise), the quality of the audio signal can be improved.

5. Image Processing

Use Case: Feature extraction and image recognition.
Example: In facial recognition systems, SVD can help in extracting essential features from facial images. By breaking down image data into singular values and vectors, the system can focus on the most informative features of facial recognition.

For any m x n matrix A, SVD finds 3 matrices U, Σ, and V such that A = UΣV*

U: m x m orthogonal matrix. This can be thought of as a set of directions/features in your data. If your data is spread out in certain directions more than others or has key features U captures this.

Σ: m x n diagonal matrix**.** This diagonal matrix tells you the strength or importance of each of these directions/features. Bigger numbers mean more important directions.

V*: n x n orthogonal matrix. This is another set of directions/features that, in combination with U, helps us understand how our original data is spread out.

Topic Modelling

Let’s understand it using the topic modeling example.

We’ll use a small, fictional set of sentences to demonstrate this.

Example Corpus

Let’s consider a small corpus of the following four sentences:

“Apple and banana are fruits.”
“Fruits like apple and banana are healthy.”
“Car and bike are vehicles.”
“Vehicles like car and bike require fuel.”

From these sentences, we will construct a term-document matrix. For simplicity, we’ll use binary frequency to represent the presence (1) or absence (0) of a word in a document (sentence).

STEP 1: Construct Term-Document Matrix

First, we identify the unique words in our corpus: [“apple”, “banana”, “fruits”, “healthy”, “car”, “bike”, “vehicles”, “fuel”]. We will remove the stopwords like [“The”, “a”]. The term-document matrix based on these words and our corpus sentences will be:

STEP 2: Lets apply SVD and decompose this.

Matrix U (Left Singular Vectors):

Row: Each row corresponds to a word in our vocabulary.

Column: The columns represent latent topics or concepts.

In summary, matrix U reveals how each word in the corpus is related to each identified latent topic.

Interpreting Specific Values:

A value of 0 indicates no association or relevance of that word to the corresponding topic. For instance, the first value of 0 in the first row means that the word “apple” has no association with the first latent topic.
A value of 0.55 suggests a strong negative correlation in the mathematical space, but for topic modeling, we typically look at the absolute value. This indicates that the word corresponding to that row (e.g., “banana” or “fruits”) is strongly associated with the second latent topic.

How are the Latent Topics determined? In Matrix U, each column represents a latent topic. These topics are abstract and are derived from patterns and relationships in the data. The SVD algorithm identifies dimensions (topics) that capture the most variance (information) in the data. It’s important to note that the algorithm doesn’t “decide” topics in a human sense. It doesn’t label them as “fruits” or “vehicles.” Instead, it mathematically constructs dimensions that capture the most significant patterns in the term-document matrix. For example, if certain words like “apple” and “banana” have high values in the first column of U, and these words are known to be fruits, you might interpret this latent topic as related to “fruits.”

Matrix Σ (Singular Values):

Matrix Σ in SVD is a diagonal matrix containing the singular values of the decomposed matrix (in our case, the term-document matrix). Here’s what these singular values represent:

Diagonal Values: The values on the diagonal of Σ are the singular values. They are always non-negative and are typically arranged in descending order. In our example:

2. Importance of Topics: Each singular value corresponds to the ‘strength’ or ‘importance’ of a latent topic in the corpus. A higher singular value indicates a topic that captures more of the variance (or information) in the corpus.

3. Dimensionality Reduction: By selecting the top k singular values (and corresponding vectors in U and V^T), we can approximate the original matrix. This is useful in reducing the complexity of data while retaining the most significant aspects.

Matrix V^T (Right Singular Vectors):

Matrix V^T is the transpose of the matrix V in the SVD, consisting of right singular vectors. This matrix relates to the documents in our corpus:

Rows in V^T: Each row in V^T corresponds to a latent topic identified in the SVD process.
Columns in V^T: Each column represents a document from the original corpus.
Values in V^T: The values in the matrix indicate the significance of each latent topic in each document.

In our example:

Interpreting V^T: For instance, the first row of V^T shows the contribution of the first latent topic across all documents. A higher absolute value in a particular column indicates that the corresponding topic is more prevalent in that document.

When to Use SVD or PCA:

Use SVD When:
You are working with sparse data. SVD handles sparse data efficiently.
Your data is not centered, and you prefer not to center it due to the nature of the data or computational constraints.
You are dealing with applications that require singular values and vectors, such as in latent semantic analysis or complex systems of equations.
Use PCA When:
Your primary goal is dimensionality reduction to capture the most variance in the data.
Interpretability of components in terms of original features is crucial.
You are working with dense data, and computational efficiency is a priority.