Matrix Factorization made easy (Recommender Systems)

Published in

Analytics Vidhya

7 min readJan 10, 2020

Recommender systems are utilized in a variety of areas and are most commonly recognized as playlist generators for video and music services like Netflix, YouTube and Spotify, product recommenders for services such as Amazon, or content recommenders for social media platforms such as Facebook and Twitter.

Recommender Systems Types -

Collaborative Filtering → Collaborative filtering approaches build a model from a user’s past behavior (items previously purchased or selected and/or numerical ratings given to those items) as well as similar decisions made by other users. This model is then used to predict items (or ratings for items) that the user may have an interest in. In Depth
Content Based Filtering → Content-based filtering approaches utilize a series of discrete, pre-tagged characteristics of an item in order to recommend additional items with similar properties. In Depth

Now, let’s begin with Matrix Factorization

A simple intuition of matrix factorization can be stated as decomposition of a matrix into product of two or three matrices.

This is also called as Multiplicative Decomposition aka Matrix Factorization

Matrix Decomposition/Factorization into three matrices (SVD)(figure — 1)

The above figure is a simple and most extensively used type of Matrix Factorization, which is known as SVD → Singular Value Decomposition

Intuition of SVD → Let there be matrix X with dimensionality of (m,n) this matrix can be viewed as a dot product between two or three matrices with each matrix having dimensions of (m,r) and (r,n).

SVD is widely used in Recommender Systems and i’ll explain why and how in a while.

First, let’s understand the three matrices which we got from our ‘X’ matrix(Rectangular).

Singular Value Decomposition(figure — 2)

U → Left singular matrix, V_Transpose → Right singular matrix, S/D → Singular value matrix. Notice the shapes of each of the matrices, we can see that S is a diagonal matrix and U, V_Transpose are rectangular matrices with their respective shapes.

Rows in U matrix is called left singular vectors and the columns in V matrix are called the right singular vectors of “X”

You can think of these three matrices as factors of X matrix. So, if you multiply all these three matrices you’ll get X as expected.

One downside of SVD is the fact that when the original matrix is sparse (incomplete) left and right singular vectors are undefined.

Matrix Factorization as Feature Engineering in Recommender Systems

User Item data set decomposed into User and Item Matrices(figure — 3)

Suppose we have a data set which contains the items ratings given by various users. This is a typical recommender systems problem where your job is do recommend new items to the i^th user based on the previous items i^th user has rated.

Let’s take n → number of users, m → number of items then our Rating Matrix will be of the order of (nxm).

After applying Matrix Factorization we get two matrices, user matrix of shape (nxd) and item matrix of shape (dxm). you can just compare the shapes with figure 1 and look at the shapes of Right and Left Singular matrices.

i^th row in user matrix is the i^th user vector, i^th column in item matrix is the i^th item vector. All the vectors are Real numbers of dimension “d”.

The d-dim representation you have arrived for your users and items is through Matrix Factorization.

user-user and item-item similarity(figure — 4)

So, let’s not forget that now you can find the user — user and item — item based similarity using these vectors for collaborative filtering.

Matrix Factorization & Objective Function Mathematically.

Here, p is the user matrix and q is the item matrix.

Now, we look at our objective function which we want to minimize respect to q (dxn) and p(nxd), same shape i explained above for item and user matrices.

Here, r_ui → rating of user u on item i, q_i → item vector, p_u → user vector.

If we look closely, the first half of the equation is nothing but Squared loss and second half is the L2 Regularization.

Code Explanation of Matrix Factorization

First let’s look at our data

We are going to predict the ratings for (user_id, movie_id) pair.

Here, the columns in our dataset are user_id, item_id and rating as expected. let’s take movies as our items.

We’ll take this dataset and convert it into our X/Rating/Adjacency matrix.

X[i][j] = rij, here i is user_id, j is movie_id and rij is rating given by user
to the movie. we will apply SVD decomposition on this matrix.

We, need our user, item and rating data to create a sparse matrix X.

#getting user id and appending to list
#getting item value and appending to list
#getting rating value and appending to listuser_data=[]
item_data=[]
ratings=[]
for i in range(data.shape[0]):
    user=(data['user_id'].iloc[i])                
    item=(data['item_id'].iloc[i])               
    rat=(data['rating'].iloc[i])                  
    user_data.append(user)
    item_data.append(item)
    ratings.append(rat)

It is sparse as there are many movies in the dataset but an i^th user won’t rate all the movies. Hence, it is a sparse matrix.

Now, we’ll create a sparse matrix X(Adjacency Matrix)

from scipy.sparse import csr_matrixadj_matrix = csr_matrix((ratings, (user_data, item_data)))
adj_matrix.shape

This is how adj_matrix will look like.

Now, since we have our matrix X, all we need to do now is perform SVD Matrix Factorization as done below,

from sklearn.utils.extmath import randomized_svd
import numpy as np 

U, S, VT = randomized_svd(adj_matrix, n_components=5,n_iter=5, random_state=None)
print(U.shape)
print(Sigma.shape)
print(VT.T.shape)

U → Left singular matrix(user), VT → Right singular matrix(item), S → Singular Value Matrix

This is how the shapes look like.

Now, we have got our user and item matrices respectively. which will look like figure 3

So the matrix U can be represented as matrix representation of users, where each row u_i represents a d-dimensional vector for a user, matrix VT can be represented as matrix representation of movies, where each row v_j represents a d-dimensional vector for a movie.

Learning Methods

One obvious method to find matrix q and p is the gradient descent method. Since we have the loss function defined, take the partial derivative respect to q and p to optimize those values. given below,

pseudo code

for each epoch:
    for each pair of (user, movie):
        b_i =  b_i - learning_rate * dL/db_i
        m_j =  m_j - learning_rate * dL/dm_j
    predict the ratings with formula y_pred_ij = mu + b_i + m_j + dot_product(u_i , v_j)

mu → it is the mean of all the ratings, u_i → user vector, v_j → item vector, b_i and m_j will be achieved through the update formula and partial derivatives.

By applying the partial derivative the update rule will look something like below, which is applied in the pseudo code