Demystifying Neural Networks: Low-Rank Approximation

The Art of Matrix Factorization

Dagang Wei
5 min readJan 27, 2024
Source: https://dustinstansbury.github.io/theclevermachine/svd-data-compression

This article is part of the series Demystifying Neural Networks.

Introduction

Data is everywhere and comes in massive quantities. Storing and efficiently processing this data can create significant overhead, especially for computationally-constrained devices. This is where the power of data compression comes in — finding ways to represent large datasets in more compact forms. One intriguing technique that’s gaining popularity is using low-rank approximations, and neural networks are proving to be valuable tools for achieving this.

What Is Low-Rank Approximation?

At its heart, a matrix represents a collection of numbers arranged in rows and columns. Many real-world datasets can be expressed as matrices. Low-rank approximation seeks to replace a given matrix with a simpler, lower-rank matrix that still captures the essence of the original data.

Imagine a matrix containing ratings for different movies by various users. Some movies might be inherently similar, and some users may have similar tastes. Low-rank approximation aims to uncover these latent patterns, resulting in a compressed representation of the matrix.

Neural Networks for Low-Rank Approximation

Neural networks are highly adaptable systems that excel at learning patterns from data. In the realm of low-rank approximation, here’s how they can help:

  1. Learning the Low-Rank Structure: Neural networks can be trained to identify the underlying patterns and relationships within data represented as matrices. They can learn to decompose a matrix into low-rank components effectively.
  2. Parameter Reduction: By using low-rank matrices to approximate a matrix, we significantly reduce the number of parameters needed to represent the data, leading to substantial compression.
  3. Adaptive Compression: Neural networks can learn compression schemes tailored to the specific dataset, rather than relying on fixed compression methods. This leads to more efficient and accurate compression than traditional approaches.

The Neural Network Architecture

Let’s consider a neural network that approximates an n x m matrix using two low-rank matrices.

  1. Input Layer: An n x n identity matrix to the neural network.
  2. Hidden Layers: Matrix U (dimensions n x k) and Matrix V (dimensions k x m). Think of the 2 matrices as learning the hidden patterns of the original matrix. Here, ‘k’ is the desired rank of the approximation, usually much smaller than both ’n’ and ‘m’.
  3. Output Layer: The output layer is the a n x m matrix which is the product of Matrix U (dimensions n x k) and Matrix V (dimensions k x m). (U * V) is the low-rank approximation of the original matrix.

The magic happens during training, where the neural network learns the optimal values within U and V to minimize the difference (error) between the original matrix and its approximation.

Code

Let’s Get Experimental! You can experiment with this concept using popular libraries like TensorFlow or PyTorch. The code is available in this colab notebook.

PyTorch:

import numpy as np
import torch
import torch.nn as nn
import matplotlib.pyplot as plt

# --- Data Prepration ---
n = 100
m = 50
hidden_rank = 20 # hidden rank of the data
rank = 10 # Rank of approximation

# True low-rank structure (for demonstration)
U_true = np.random.rand(n, hidden_rank)
V_true = np.random.rand(hidden_rank, m)
original_matrix = U_true @ V_true + 0.1 * np.random.randn(n, m)
print(f'Original matrix shape: {original_matrix.shape}')

# The model input is an identity matrix, instead of the original matrix.
# So all the information about the expected output data can be compressed
# into the 2 low-rank matrices.
input = torch.from_numpy(np.eye(n)).float()

# Use the original matrix as the label.
label = torch.from_numpy(original_matrix).float()

# --- Defining the Model ---
class LowRankApprox(nn.Module):
def __init__(self, n, m, rank):
super().__init__()
self.U = nn.Linear(n, rank, bias=False)
self.V = nn.Linear(rank, m, bias=False)

def forward(self, x):
return self.V(self.U(x))

model = LowRankApprox(n, m, rank)
lr = 0.01
optimizer = torch.optim.Adam(model.parameters(), lr=lr)
loss_fn = nn.MSELoss()

# --- Training ---
epochs = 500
for epoch in range(epochs):
output = model(input)
loss = loss_fn(output, label)

optimizer.zero_grad()
loss.backward()
optimizer.step()

if epoch % 100 == 0:
print(f'Epoch {epoch}: Loss = {loss.item():.4f}')

# --- Saving Matrices ---
U_matrix = model.U.weight.detach().numpy().transpose()
V_matrix = model.V.weight.detach().numpy().transpose()
print(f'U_matrix shape: {U_matrix.shape}')
print(f'V_matrix shape: {V_matrix.shape}')
np.save('U_matrix.npy', U_matrix)
np.save('V_matrix.npy', V_matrix)


# --- Reloading Matrices ---
U_matrix = np.load('U_matrix.npy')
V_matrix = np.load('V_matrix.npy')
print(f'Loaded U_matrix shape: {U_matrix.shape}')
print(f'Loaded V_matrix shape: {V_matrix.shape}')
approximation = U_matrix @ V_matrix

# --- Evaluation ---
compression_ratio = (n * m) / (n * rank + rank * m )
print(f'Compression Ratio: {compression_ratio:.2f}')

error = np.linalg.norm(original_matrix - approximation)
print(f'Approximation Error (Frobenius Norm): {error:.4f}')

abs_error = np.abs(original_matrix - approximation)
relative_error = abs_error / (np.abs(original_matrix) + 1e-8)
percent_error = 100 * np.mean(relative_error)
print(f'Approximation Error (Avg Percent): {percent_error:.2f}%')

# --- Visualization ---
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.imshow(original_matrix)
plt.title(f'Original: {n}*{m}')

plt.subplot(1, 2, 2)
plt.imshow(approximation)
plt.title(f'Compressed: {n}*{rank}+{rank}*{m}')
plt.show()

TensorFlow:

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

# --- Data Preparation ---
n = 100
m = 50
hidden_rank = 20
rank = 10

# True low-rank structure
U_true = np.random.rand(n, hidden_rank)
V_true = np.random.rand(hidden_rank, m)
original_matrix = U_true @ V_true + 0.1 * np.random.randn(n, m)
print(f'Original matrix shape: {original_matrix.shape}')

# Input and Label
input = tf.eye(n, dtype=tf.float32)
label = tf.constant(original_matrix, dtype=tf.float32)

# --- Model ---

# Model, Optimizer, Loss (same as before)
layer1 = tf.keras.layers.Dense(rank, use_bias=False)
layer2 = tf.keras.layers.Dense(m, use_bias=False)
model = tf.keras.Sequential([layer1, layer2])

lr = 0.01
optimizer = tf.keras.optimizers.Adam(learning_rate=lr)
loss_fn = tf.keras.losses.MeanSquaredError()

# Compile the model (specify optimizer, loss function)
model.compile(optimizer=optimizer, loss=loss_fn)

# --- Training ---
history = model.fit(input, label, epochs=500, verbose=0)

# Print representative losses
for i in range(0, epochs, 100):
print(f'Epoch {i}: Loss = {history.history["loss"][i]:.4f}')

# --- Saving Matrices ---
U_matrix = layer1.get_weights()[0]
V_matrix = layer2.get_weights()[0]
print(f'U_matrix shape: {U_matrix.shape}')
print(f'V_matrix shape: {V_matrix.shape}')
np.save('U_matrix.npy', U_matrix)
np.save('V_matrix.npy', V_matrix)

# --- Reloading Matrices ---
U_matrix = np.load('U_matrix.npy')
V_matrix = np.load('V_matrix.npy')
print(f'Loaded U_matrix shape: {U_matrix.shape}')
print(f'Loaded V_matrix shape: {V_matrix.shape}')
approximation = U_matrix @ V_matrix

# --- Evaluation ---
compression_ratio = (n * m) / (n * rank + rank * m )
print(f'Compression Ratio: {compression_ratio:.2f}')

error = np.linalg.norm(original_matrix - approximation)
print(f'Approximation Error (Frobenius Norm): {error:.4f}')

abs_error = np.abs(original_matrix - approximation)
relative_error = abs_error / (np.abs(original_matrix) + 1e-8)
percent_error = 100 * np.mean(relative_error)
print(f'Approximation Error (Avg Percent): {percent_error:.2f}%')

# --- Visualization ---
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.imshow(original_matrix)
plt.title(f'Original: {n}*{m}')

plt.subplot(1, 2, 2)
plt.imshow(approximation)
plt.title(f'Compressed: {n}*{rank}+{rank}*{m}')
plt.show()

Output:

Original matrix shape: (100, 50)
Epoch 0: Loss = 26.3561
Epoch 100: Loss = 0.1604
Epoch 200: Loss = 0.1459
Epoch 300: Loss = 0.1421
Epoch 400: Loss = 0.1366
U_matrix shape: (100, 10)
V_matrix shape: (10, 50)
Loaded U_matrix shape: (100, 10)
Loaded V_matrix shape: (10, 50)
Compression Ratio: 3.33
Approximation Error (Frobenius Norm): 25.3481
Approximation Error (Avg Percent): 5.91%

Conclusion

Data compression with low-rank approximations and neural networks is still a field under active research, holding exciting promises for managing the ever-growing data deluge more efficiently. Let me know if you would like me to elaborate on any of these points or provide code examples!

References

https://blog.tensorflow.org/2020/02/matrix-compression-operator-tensorflow.html

--

--