Predicting Ratings with Matrix Factorization Methods

Héctor Lira

Published in

Beek Tech

7 min readFeb 19, 2019

TL;DR

Matrix Factorization methods approximate a matrix of ratings, R, by the product of two matrices, P and Q.
Predictions for users that were not used to estimate P and Q need to be made using Ordinary Least Squares estimators.

Matrix Factorization Methods

The idea of Matrix Factorization Methods is to ‘decompose’ the ratings matrix, R, into a product of two lower dimension matrices, P and Q, the former representing a matrix where we capture the affinity of users with a number of dimensions, k, and the latter representing the similarity of each item to those dimensions.

Formally, if we have data for n users over m items, the matrix R has dimensions of n×m, P has dimensions of k×n, and Q has dimensions k×m. The estimation is stated as:

The k dimensions are called latent factors and represent intrinsic interactions between users and items. The algorithm tries to describe these latent features by creating item and user profiles. With these profiles, we can predict the rating a user would give an item and recommend items that are predicted to receive high ratings by the user.

In this blog post, we will use the recosystem library in R to make recommendations to users. The recosystem library provides functions to easily train models using Matrix Factorization methods. If you’d like to learn more about how to use this library and how the algorithm finds the estimated matrices I suggest you read the documentation:

recosystem: Recommender System Using Parallel Matrix Factorization

Predictions for Out-of-Sample Users

Taking the example from the recosystem documentation and the training and test data provided, we will focus on making predictions for users in the test set that were not used to train the model.

The data used here is in a (user_index, item_index, rating) format. Ratings take values in the {1, 2, 3, 4, 5} set.

First, train the model:

# Import the necessary libraries
library(recosystem)
library(dplyr)# Reading the data
train_file <- data_file(system.file("dat", "smalltrain.txt", package = "recosystem"))
test_file <- data_file(system.file("dat", "smalltest.txt", package = "recosystem"))# Create the model object
r = Reco()# Train the model
# Grid search to find the optimum parameters
set.seed(123) # This is a randomized algorithm
opts_tune <- r$tune(train_file)$min# Train the model and store it locally
r$train(train_file, opts = opts_tune, out_model = file.path("/your/file/path", "model.txt"))

Next, make predictions for all users and store them in a vector:

pred <- r$predict(test_file, out_memory())

Find users that are in the test set that were not used to train the model:

# First, read the training and test set as tables
train_file <- read.table(system.file("dat", "smalltrain.txt", package = "recosystem"))
test_file <- read.table(system.file("dat", "smalltest.txt", package = "recosystem"))setdiff(unique(test_file$V1), unique(train_file$V1))Out:
[1] 219

The user with index 219 belongs to the test set and was not used to train the model. How do the predictions for this user look?

Append the predictions to the test_file data frame and find the predictions for this user:

test_file_pred <- data.frame(test_file, pred = pred)test_file_pred %>% filter(V1 == 219)Out:
    V1  V2  pred
1  219 316 3.007
2  219 640 3.007
3  219 340 3.007
4  219 198 3.007
5  219 543 3.007
6  219 572 3.007
7  219 461 3.007
8  219 932 3.007
9  219 900 3.007
10 219 289 3.007
11 219 226 3.007

All of the predicted ratings for this user coincide with the average rating across all users in the training set:

mean(train_file$V3)Out:
[1] 3.007

Taking the average ratings and inserting this as the predicted rating for every user-item pair is equivalent to not making recommendations at all. There is no way to rank items this way for a user.

This would imply that we could not make personalized recommendations for new users (and not used to train the model), even though the user might have already provided some ratings or evidence of liking some items.

OLS Estimators Derivation

The correct way to make predictions on ratings for a specific user is to consider two cases:

when the user was used to train the model, and
when the user was not used to train the model.

When the user u was used to train the model, predict the rating of item i using the inner product

When the user u was not used to train the model, we need to estimate the vector pᵤ. To do this, pose the following problem:

We want to estimate a matrix Rᵤ of dimension 1×m with two lower dimension matrices: a matrix Pᵤ of dimension k×1 and the matrix Q of dimension k×m.

Formally,

If we consider users not used to train the model for which we know a certain number of ratings of the m items used to train the model, we could have a similar problem to estimating a linear regression:

Estimate β in

Then, the way to make predictions is to estimate the matrix Pᵤ the same way we estimate β in linear regression: using Ordinary Least Squares.

The vector to estimate in the linear regression problem appears in the right-hand side of the matrix product Xβ, while the matrix to approximate in our original problem appears in the left-hand side. However, the derivation is as simple as in linear regression:

Define the residual sum of squares as

Take the derivative of RSS with respect to P and set it equal to zero:

Then,

Note that RQ’P and P’QR’ are both the same scalars and RR’ is independent of P.

Then,

Finally,

Where

is our estimated matrix of Pᵤ. As in linear regression, we need to have more observations than variables to train the model. In our case, we need to use at least k ratings from a user on any of the m items used to train the model.

Having this matrix, we predict the ratings of a user not included to train the model as if she was included to train the model: take the inner product

Example

Let’s create some example data for the user 219 we found earlier in the data provided by the recosystem library.

Define nᵤ as the number of items the user 219 rated from the m items used to train the model.

set.seed(123)
n_u <- 12
items_u <- sample(train_file$V2, n_u)
ratings_u <- sample(c(1, 2, 3, 4, 5), n_u, replace = T)p_u <- data.frame(V1 = 219, V2 = items_u, V3 = ratings_u)

Read the model into your memory:

model <- read.table("/your/file/path/model.txt", skip = 5)

Extract the item vectors from the items the user 219 rated. This should be a matrix of size k×nᵤ:

q_u <- model %>%
  filter(V1 %in% paste0("q", items_u)) %>%
  select(-c(V1, V2)) %>%
  as.matrix %>%
  t

Calculate (QQ’)⁻¹:

first_prod <- tcrossprod(q_u, q_u) %>%
  solve

Calculate Q’(QQ’)⁻¹:

second_prod <- crossprod(q_u, first_prod)

Finally, calculate RQ’(QQ’)⁻¹:

p_u_est <- crossprod(as.matrix(p_u$V3), second_prod)p_u_estOut:
           V3        V4     V5        V6      V7        V8         V9       V10        V11      V12
[1,] 7.470921 -1.211137 2.3026 0.4800567 2.62823 -4.101964 -0.6423028 -7.016185 0.05477537 4.571988

The estimators in p_u_est should match the estimators from a linear regression model without the bias parameter:

data_for_lm <- data.frame(y = p_u$V3, t(q_u))
lm_model <- lm(y ~ . - 1, data = data_for_lm)lm_model$coefficientsOut:
         V3          V4          V5          V6          V7          V8          V9         V10         V11         V12 
 7.47092111 -1.21113661  2.30260034  0.48005674  2.62822953 -4.10196436 -0.64230284 -7.01618547  0.05477537  4.57198777

To make predictions for this user, take the product of the matrix P’ and Q:

# Create a matrix Q_u by taking the columns of Q from the items you want predictions for
items_to_predict <- (test_file_pred %>% filter(V1 == 219))$V2Q_u <- model %>%
  filter(V1 %in% paste0("q", items_to_predict)) %>%
  select(-c(V1, V2)) %>%
  as.matrix %>%
  t# Make predictions
predicted_ratings <- crossprod(t(p_u_est), Q_u)

We obtain very different predicted ratings than the ones calculated before:

test_file_pred %>%
  filter(V1 == 219) %>%
  mutate(new_pred = as.numeric(predicted_ratings))Out:
    V1  V2  pred new_pred
1  219 316 3.007 5.893066
2  219 640 3.007 2.481716
3  219 340 3.007 1.810961
4  219 198 3.007 6.997799
5  219 543 3.007 2.793121
6  219 572 3.007 6.603511
7  219 461 3.007 4.314775
8  219 932 3.007 4.390416
9  219 900 3.007 3.756519
10 219 289 3.007 3.782306
11 219 226 3.007 4.033049

Conclusion

Training recommender system models has been made easy with libraries such as recosystem. When taking one of these algorithms into production make sure that you are using the model correctly. We provided a way to make recommendations for users that are not considered to train a recommender system model. The use of Ordinary Least Squares is needed in these cases.

If you want to learn more about making recommendations for users in Matrix Factorization algorithms I suggest you read this Quora answer:

https://www.quora.com/How-do-I-predict-values-with-Matrix-Factorization-method-in-a-recommender-system