Implementing Matrix Factorization Technique for Recommender Systems from scratch

6 min readMar 26, 2023

Introduction

Here we are implementing the approach discussed in “Matrix Factorization Techniques for Recommender Systems” by Yehuda Koren, Robert Bell, and Chris Volinsky. It is a seminal paper in the field of recommender systems and proposes a matrix factorization approach for collaborative filtering.

Recommender systems have become ubiquitous in our daily lives. They are used by companies like Netflix, Amazon, and Spotify to recommend movies, products, and music to their users. Recommender systems use data to predict users’ preferences and make recommendations based on those predictions.

There are two main types of recommender systems: content-based and collaborative filtering. Content-based recommender systems recommend items similar to those that a user has liked in the past. Collaborative filtering recommender systems, on the other hand, recommend items based on the preferences of similar users.

Matrix factorization is a popular method for collaborative filtering. The idea behind matrix factorization is to decompose the user-item matrix into two lower-rank matrices: one that represents the users’ preferences and one that represents the items’ characteristics. The user-item matrix is then reconstructed by taking the dot product of these two matrices.

In this blog post, we will explain matrix factorization for recommender systems and provide a Python implementation from scratch.

The Data

You can download the dataset used in this blog from here. After downloading and extracting the dataset, we can load it into a Pandas data frame as follows:

import pandas as pd
from scipy.sparse import csr_matrix

data = pd.read_csv('ratings_train.csv')
ratings = csr_matrix(
    (data.rating.values, (data.user_id.values, data.item_id.values)), 
    shape=(len(data.user_id.unique()), len(data.item_id.unique()))
).toarray()

Matrix Factorization

The user-item matrix can be represented as a matrix R, where each row represents a user and each column represents an item. The entries of the matrix are the ratings that the users gave to the items. However, the matrix is usually very sparse because each user has only rated a small fraction of the items.

Matrix factorization aims to factorize the user-item matrix R into two lower-rank matrices P and Q, such that their product approximates R. The matrix P represents the users' preferences and the matrix Q represents the items' characteristics. Each row of the matrix P represents a user's preferences as a vector, and each row of the matrix Q represents an item's characteristics as a vector.

The user-item matrix R can be approximated by taking the dot product of the corresponding rows of P and Q. That is, the (i,j) entry of R can be approximated as P[i,:] dot Q[:,j].

The goal of matrix factorization is to find the matrices P and Q that minimize the error between the approximated matrix and the actual matrix. This error is usually measured using the root mean squared error (RMSE) between the predicted ratings and the actual ratings but here in this experiment we are just using the simple error:

def predict(self, u, i):
    return self.global_bias+self.user_biases[u]+self.item_biases[i]+self.user_vecs[u]@self.item_vecs[i]

def error(self, u, i):
    pred_rat = self.predict(u, i)
    return self.ratings[u, i] - pred_rat

The optimization problem can be solved using gradient descent. The idea behind gradient descent is to iteratively update the matrices P and Q in the direction of the negative gradient of the RMSE with respect to the matrices. The updated rules for the matrices are as follows:

def update_biases_and_vectors(self, error, u, i):
    # Update biases
    self.user_biases[u] += self.l_rate*(error - self.alpha*self.user_biases[u])
    self.item_biases[i] += self.l_rate*(error - self.alpha*self.item_biases[i])
    # Update User and item Vectors
    self.user_vecs[u, :] += self.l_rate*(error*self.item_vecs[i, :] - self.alpha*self.user_vecs[u, :])
    self.item_vecs[i, :] += self.l_rate*(error*self.user_vecs[u, :] - self.alpha*self.item_vecs[i, :])

Here in this setting, you can use L2 Regularization but we are not using this, since the input dataset is very simple.

The optimization problem can be solved by iteratively updating the matrices P and Q until the error converges.

Python Implementation

Now that we have explained matrix factorization for recommender systems, we can implement it in Python from scratch. Here is the code:

import time, numpy as np
import matplotlib.pyplot as plt

class MatrixFactorization():
    
    def __init__(self, ratings, n_factors=100, l_rate=0.01, alpha=0.01, n_iter=100):
        self.ratings = ratings
        self.n_users, self.n_items = ratings.shape
        self.non_zero_row_ind, self.non_zero_col_ind = ratings.nonzero()
        self.n_interac = len(ratings[np.where(ratings != 0)])
        self.ind_lst = list(range(self.n_interac))
        self.n_factors = n_factors
        self.l_rate = l_rate  # eta0 Constant that multiplies the update term
        self.alpha = alpha  # lambda Constant that multiplies the regularization term
        self.n_iter = n_iter
        self.mse_lst = []
        self.wait = 10
        self.tol = 1e-3
        self.n_iter_no_change = 10
        self.verbose = True
        self.stop = False
        
    def initialize(self, ):
        self.now = time.time()
        # Initialize Bias Values
        self.user_biases = np.zeros(self.n_users)
        self.item_biases = np.zeros(self.n_items)
        # initialize user & item vectors        
        self.user_vecs = np.random.normal(scale=1/self.n_factors, size=(self.n_users, self.n_factors))
        self.item_vecs = np.random.normal(scale=1/self.n_factors, size=(self.n_items, self.n_factors))
        # compute global bias
        self.global_bias = np.mean(self.ratings[np.where(self.ratings != 0)])
        self.evaluate_the_model(0)
        
        
    def predict(self, u, i):
        return self.global_bias+self.user_biases[u]+self.item_biases[i]+self.user_vecs[u]@self.item_vecs[i]
        
    def update_biases_and_vectors(self, error, u, i):
        # Update biases
        self.user_biases[u] += self.l_rate*(error - self.alpha*self.user_biases[u])
        self.item_biases[i] += self.l_rate*(error - self.alpha*self.item_biases[i])
        # Update User and item Vectors
        self.user_vecs[u, :] += self.l_rate*(error*self.item_vecs[i, :] - self.alpha*self.user_vecs[u, :])
        self.item_vecs[i, :] += self.l_rate*(error*self.user_vecs[u, :] - self.alpha*self.item_vecs[i, :])
        
    def evaluate_the_model(self, epoch):
        tot_square_error = 0
        for index in self.ind_lst:
            # Extracting user item information indices in which we have a rating
            u, i = self.non_zero_row_ind[index], self.non_zero_col_ind[index]
            pred_rat = self.predict(u, i)
            tot_square_error += (self.ratings[u, i]-pred_rat)**2
        mse = tot_square_error/self.n_interac
        self.mse_lst.append(mse)
        if self.verbose: 
            print(f"---> Epoch {epoch}")
            temp = np.round(time.time()-self.now, 3)
            print(f"ave mse {np.round(self.mse_lst[-1], 3)} ===> Total training time: {temp} seconds.")
        
    def early_stopping(self, epoch):
        if (self.mse_lst[-2] - self.mse_lst[-1]) <= self.tol:
            if self.wait == self.n_iter_no_change:
                temp = np.round(time.time()-self.now, 3)
                if self.verbose: print(f"Convergence after {epoch} epochs time took: {temp} seconds.")
                self.stop = True
                self.conv_epoch_num = epoch
            self.wait += 1
        else:
            self.wait = 0
        
    def fit(self, ):
        self.initialize()
        for epoch in range(1, self.n_iter):
            np.random.shuffle(self.ind_lst)
            if self.stop == False:
                for index in self.ind_lst:
                    # Extracting user item information indices in which we have a rating
                    u, i = self.non_zero_row_ind[index], self.non_zero_col_ind[index]
                    pred_rat = self.predict(u, i)
                    error = self.ratings[u, i]-pred_rat
                    self.update_biases_and_vectors(error, u, i)
                self.evaluate_the_model(epoch)
                self.early_stopping(epoch)
        self.plot_the_score()
                
    def plot_the_score(self, ):
        plt.figure(figsize=(18,6))
        plt.plot(range(1, 1+len(self.mse_lst)), self.mse_lst, marker='o')
        plt.title("SGD Custom Prepared USER & ITEM vector's Tr MSE loss vs epochs", fontsize=20)
        plt.xlabel('Number of epochs', fontsize=18)
        plt.ylabel('mean square error', fontsize=18)
        plt.xticks(range(1, self.conv_epoch_num+5), fontsize=15, rotation=90)
        plt.yticks(np.linspace(min(self.mse_lst), max(self.mse_lst),15), fontsize=15)
        plt.grid()
        plt.show()

The code above defines a class MatrixFactorization that takes in four hyperparameters: n_factors which is the number of factors to use in the matrix factorization, l_rate which controls the step size of the update, alpha which is the regularization parameter, and n_iter which is the number of iterations to run the optimization algorithm.

Testing the Model

Next, the code fits the matrix factorization model to the training data using the fit method of the MatrixFactorization class. We use n_factors=100, l_rate=0.01, alpha=0.01, and n_iter=100 as hyperparameters for the model.

obj = MatrixFactorization(ratings)
obj.fit()

---> Epoch 0
ave mse 1.267 ===> Total training time: 0.67 seconds.
---> Epoch 1
ave mse 0.926 ===> Total training time: 3.7 seconds.
---> Epoch 2
ave mse 0.884 ===> Total training time: 6.746 seconds.
---> Epoch 3
ave mse 0.867 ===> Total training time: 9.976 seconds.
---> Epoch 4
ave mse 0.856 ===> Total training time: 13.864 seconds.


---> Epoch 70
ave mse 0.025 ===> Total training time: 253.703 seconds.
---> Epoch 71
ave mse 0.025 ===> Total training time: 257.261 seconds.
---> Epoch 72
ave mse 0.024 ===> Total training time: 260.886 seconds.
---> Epoch 73
ave mse 0.023 ===> Total training time: 264.487 seconds.
---> Epoch 74
ave mse 0.023 ===> Total training time: 268.097 seconds.
Convergence after 74 epochs time took: 268.097 seconds.

Conclusion

In this blog post, we have explained matrix factorization for recommender systems and implemented it in Python from scratch. We have also tested the model and evaluated its performance.

Please let me know your Feed Back. Thanks for reading.