Demystifying Neural Networks: Recommendation as Matrix Factorization

Revealing the hidden factors

5 min readFeb 6, 2024

Source: https://developers.google.com/machine-learning/recommendation/collaborative/matrix

This article is part of the series Demystifying Neural Networks.

Introduction

In today’s data-driven world, recommendation systems have become ubiquitous, powering user experiences across digital platforms, from e-commerce sites to streaming services. Among the various techniques for building recommendation systems, collaborative filtering based on matrix factorization has emerged as a powerful and widely used approach. This blog post aims to demystify how neural networks are applied to collaborative filtering, enhancing the ability to provide personalized recommendations by modeling complex user-item interactions.

What is Collaborative Filtering?

At its core, collaborative filtering is a method used to predict the preferences of a user by collecting preferences from many users. The assumption is that if users agreed in the past, they will agree in the future about other items. Traditional collaborative filtering techniques can be broadly categorized into two types: user-based and item-based. However, matrix factorization techniques, which we are focusing on, offer a more sophisticated approach by identifying latent factors associated with users and items.

Matrix Factorization

Matrix factorization, specifically when applied to collaborative filtering, works by decomposing the user-item interaction matrix into lower-dimensional matrices, capturing latent factors associated with users and items. Imagine we have a matrix where rows represent users, columns represent items, and each cell contains the rating that a user has given to an item. The goal of matrix factorization is to approximate this matrix by finding two lower-dimensional matrices whose product is close to the original matrix.

The beauty of this approach lies in its ability to model users and items as embeddings (vectors in a lower-dimensional space). Each element of a rating matrix is treated as the dot product of the user and the item embedding. This representation allows the system to capture the underlying patterns in the data, enabling it to predict how a user would rate an item they haven’t interacted with yet.

Learning Embeddings with Neural Networks

Enter neural networks, which have revolutionized the way embeddings are learned in collaborative filtering systems. Neural networks can learn complex non-linear relationships between users and items, allowing for more accurate and nuanced recommendations. The structure of these neural networks is designed to learn the embeddings by optimizing a loss function, which measures the difference between the predicted and actual ratings.

The Structure

The neural network architecture for collaborative filtering typically consists of an input layer, one embedding layer, and an output layer with an activation function. The input layer takes the user and item IDs, which are then passed through embedding layers, then the output layer takes dot product of the user and item embeddings, and finally pass it to the activation function.

The training process involves adjusting the weights of the network to minimize the error between the predicted ratings and the actual ratings. This process is typically done using backpropagation and an optimization algorithm like stochastic gradient descent.

The Advantages

Neural networks bring several advantages to collaborative filtering:

Flexibility: They can easily incorporate additional types of data, such as user demographics or item descriptions, into the recommendation process.
Non-linearity: They can capture complex and non-linear relationships between users and items.
Scalability: Modern neural networks are highly scalable, capable of handling large datasets common in today’s recommendation systems.

Example

The following is an example in PyTorch. The code is available in this colab notebook.

import numpy as np
import torch
from torch.utils.data import Dataset, DataLoader
import torch.nn as nn
import torch.optim as optim
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

# Set the random seed for reproducibility
np.random.seed(42)
torch.manual_seed(42)

# Parameters
n_users, n_items, n_factors = 100, 50, 10
dominant_factors = 3  # Number of dominant features
noise_factor = 0.1  # Noise level
missing_data_percentage = 20  # Percentage of ratings to be randomly removed as missing data

# Helper Functions
def generate_features(n_entities, n_factors, dominant_factors):
    features = np.zeros((n_entities, n_factors))
    for i in range(n_entities):
        dominant_indices = np.random.choice(n_factors, dominant_factors, replace=False)
        features[i, dominant_indices] = 1
    return features

def scale_ratings(ratings_matrix):
    return np.clip(np.rint(1 + 4 * (ratings_matrix - ratings_matrix.min()) / (ratings_matrix.max() - ratings_matrix.min())), 1, 5).astype(int)

def normalize_ratings(ratings_scaled):
    return (ratings_scaled - 1) / 4.0

def introduce_missing_data(ratings_matrix, missing_percentage):
    np.random.seed(42)  # Ensure reproducibility
    mask = np.random.rand(*ratings_matrix.shape) < (missing_percentage / 100.0)
    ratings_matrix[mask] = 0  # Set selected ratings to zero
    return ratings_matrix

def prepare_data_with_missing_data(ratings_matrix):
    ratings_matrix = introduce_missing_data(ratings_matrix, missing_data_percentage)
    users, items = np.where(ratings_matrix > 0)  # Only consider non-missing ratings
    ratings = ratings_matrix[users, items]
    ratings_scaled = scale_ratings(ratings)
    normalized_ratings = normalize_ratings(ratings_scaled)
    return train_test_split(users, items, normalized_ratings, test_size=0.2, random_state=42)

class RatingsDataset(Dataset):
    def __init__(self, users, items, ratings):
        self.users = users
        self.items = items
        self.ratings = ratings

    def __len__(self):
        return len(self.ratings)

    def __getitem__(self, idx):
        return torch.tensor(self.users[idx], dtype=torch.long), torch.tensor(self.items[idx], dtype=torch.long), torch.tensor(self.ratings[idx], dtype=torch.float)

# Data Preparation
user_features = generate_features(n_users, n_factors, dominant_factors)
item_features = generate_features(n_items, n_factors, dominant_factors)
ratings_matrix = np.dot(user_features, item_features.T)
ratings_matrix += np.random.normal(0, noise_factor, ratings_matrix.shape)
train_users, test_users, train_items, test_items, train_ratings, test_ratings = prepare_data_with_missing_data(ratings_matrix)
train_dataset = RatingsDataset(train_users, train_items, train_ratings)
test_dataset = RatingsDataset(test_users, test_items, test_ratings)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64)

# Model Definition
class RecommenderNet(nn.Module):
    def __init__(self, n_users, n_items, n_factors):
        super(RecommenderNet, self).__init__()
        self.user_embedding = nn.Embedding(n_users, n_factors)
        self.item_embedding = nn.Embedding(n_items, n_factors)

    def forward(self, user_ids, item_ids):
        user_embedded = self.user_embedding(user_ids)
        item_embedded = self.item_embedding(item_ids)
        return torch.sigmoid((user_embedded * item_embedded).sum(dim=1))

def mean_absolute_percentage_error(y_true, y_pred):
    """
    Calculate MAPE given y_true and y_pred
    """
    y_true, y_pred = np.array(y_true), np.array(y_pred)
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

def evaluate(model, test_loader):
    model.eval()
    y_pred, y_true = [], []
    with torch.no_grad():
        for user_ids, item_ids, ratings in test_loader:
            predictions = model(user_ids, item_ids)
            y_pred.extend((predictions * 4 + 1).numpy())
            y_true.extend((ratings * 4 + 1).numpy())
    mse_rounded = mean_squared_error(y_true, np.rint(y_pred))
    print(f"Test MSE (Rounded Predictions): {mse_rounded}")
    mape_rounded = mean_absolute_percentage_error(y_true, np.rint(y_pred))
    print(f"Test MAPE (Rounded Predictions): {mape_rounded:.2f}%")

# Training and Evaluation
def train_and_evaluate(model, train_loader, test_loader, epochs=500):
    optimizer = optim.Adam(model.parameters(), lr=0.01)
    criterion = nn.MSELoss()
    for epoch in range(epochs):
        model.train()
        total_loss = 0
        for user_ids, item_ids, ratings in train_loader:
            optimizer.zero_grad()
            predictions = model(user_ids, item_ids)
            loss = criterion(predictions, ratings)
            loss.backward()
            optimizer.step()
            total_loss += loss.item()
        if epoch % 10 == 0:
            print(f"Epoch {epoch+1}, Loss: {total_loss / len(train_loader)}")
    evaluate(model, test_loader)

def identify_missing_data_indices(ratings_matrix, n_missing_samples=100):
    """
    Identify indices of missing data in the ratings matrix.
    
    Parameters:
        ratings_matrix (np.array): The original ratings matrix with missing data simulated.
        n_missing_samples (int): Number of missing samples to predict.
    
    Returns:
        np.array: Indices of the missing samples.
    """
    missing_indices = np.argwhere(ratings_matrix == 0)
    np.random.seed(42)  # For reproducibility
    selected_indices = np.random.choice(len(missing_indices), size=n_missing_samples, replace=False)
    return missing_indices[selected_indices]

def predict_missing_ratings(model, missing_indices):
    """
    Predict ratings for the given missing data indices using the trained model.
    
    Parameters:
        model (torch.nn.Module): The trained PyTorch model.
        missing_indices (np.array): Indices of the missing samples to predict.
    
    Returns:
        list: Predicted ratings for the missing samples.
    """
    model.eval()
    predicted_ratings = []
    with torch.no_grad():
        for idx in missing_indices:
            user_id, item_id = torch.tensor([idx[0]], dtype=torch.long), torch.tensor([idx[1]], dtype=torch.long)
            prediction = model(user_id, item_id)
            predicted_rating = prediction.item() * 4 + 1  # Scale back to original rating scale
            predicted_ratings.append(predicted_rating)
    return predicted_ratings

# Define the model
model = RecommenderNet(n_users, n_items, n_factors)

# Train the model
train_and_evaluate(model, train_loader, test_loader)

# After training the model
missing_indices = identify_missing_data_indices(ratings_matrix)
predicted_missing_ratings = predict_missing_ratings(model, missing_indices)

# Optionally, print a few predictions
print(f'Predicted ratings: {predicted_missing_ratings[:10]}')

Output:

Test MSE (Rounded Predictions): 0.1776798814535141
Test MAPE (Rounded Predictions): 7.21%
[1.949854850769043, 1.2215512245893478, 1.9180905222892761, 1.5602117776870728, 3.209111452102661, 3.4369332790374756, 1.2163558155298233, 4.109908580780029, 1.4270673394203186, 3.498795986175537]

Conclusion

The integration of neural networks into collaborative filtering recommendation systems has significantly advanced the field, offering more personalized and accurate recommendations. By modeling users and items as embeddings and leveraging the power of neural networks to learn these embeddings, we can better understand and predict user preferences. As technology continues to evolve, we can expect even more sophisticated approaches to emerge, further enhancing our ability to deliver tailored content to users across various platforms.