Matrix Factorization Advanced: Pictures + Code (PyTorch) — (Part 2)

Daniel Lam
5 min readMar 25, 2023

--

TLDR:
Problem:
Can we improve matrix factorization by adding 1) user & item biases 2) offset 3) weight initialization 4) sigmoid_range? For basic info on matrix factorization and the movielens dataset, please refer to part 1 in the link below.
Dataset: ml-latest-small.zip from https://grouplens.org/datasets/movielens/
Data consists of users, movies, ratings, timestamps, titles, and genres.
Solution (2 parts):
1) Basic matrix factorization https://medium.com/@datadote/pytorch-matrix-factorization-pictures-code-part-1-abe331317ffb
2) Advanced matrix factorization (bias terms, offset, weight initialization, sigmoid_range)
Code: “02_matrix_fact_advanced.ipynb” https://github.com/Datadote/matrix-factorization-pytorch

Steps:
1) Problem + Dataset Recap
2) Matrix Factorization improvements
3) Add F1, precision, recall metrics
4) Train model
5) Check results

1) Problem + Dataset Recap (Preprocessed in part 1)

Problem: Given a dataset of users, movies, and ratings. Can we create a model that predicts movie ratings for users?
Dataset: ml-latest-small.zip from https://grouplens.org/datasets/movielens/
Data consists of users, movies, ratings, timestamps, titles, and genres.

After label encoding, userId and movieId have different values from earlier

2) Matrix Factorization improvements

From part 1, the basic matrix factorization is a dot product of a user embedding and an item embedding. To improve performance, add 1) user & item biases 2) offset 3) weight initialization 4) sigmoid_range.

Basic Matrix Factorization
  1. User & item biases: some users rate movies, on average, higher than other users. Item biases have a similar idea. For example, higher quality items might be rated higher, on average, than low quality items.
  2. Offset: common technique to add a global offset
  3. Weight initialization: nn.Embedding, by default, is initialized with a normal distribution(0, 1). Change the weight initialization to uniform between [0, 0.05]. This empirically performs better than the defaults.
  4. Sigmoid_range: Idea taken from Fastai collab notebook. Clamp outputs between [0, 5.5] Empirically, clamping at 5.5 performs better than at 5.
Sigmoid_range
Matrix Factorization + user/item biases, offset, sigmoid_range
class MFAdvanced(nn.Module):
""" Matrix factorization + user & item bias, weight init., sigmoid_range """
def __init__(self, num_users, num_items, emb_dim, init, bias, sigmoid):
super().__init__()
self.bias = bias
self.sigmoid = sigmoid
self.user_emb = nn.Embedding(num_users, emb_dim)
self.item_emb = nn.Embedding(num_items, emb_dim)
if bias:
self.user_bias = nn.Parameter(torch.zeros(num_users))
self.item_bias = nn.Parameter(torch.zeros(num_items))
self.offset = nn.Parameter(torch.zeros(1))
if init:
self.user_emb.weight.data.uniform_(0., 0.05)
self.item_emb.weight.data.uniform_(0., 0.05)
def forward(self, user, item):
user_emb = self.user_emb(user)
item_emb = self.item_emb(item)
element_product = (user_emb*item_emb).sum(1)
if self.bias:
user_b = self.user_bias[user]
item_b = self.item_bias[item]
element_product += user_b + item_b + self.offset
if self.sigmoid:
return sigmoid_range(element_product, 0, 5.5)
return element_product
n_users = len(df.userId.unique())
n_items = len(df.movieId.unique())
mdl = MFAdvanced(n_users, n_items, emb_dim=32,
init=CFG['init'], # CFG=True
bias=CFG['bias'], # CFG=True
sigmoid=CFG['sigmoid'], # CFG=True
)

By default, MFAdvanced has initialization (init), bias, and sigmoid_range (sigmoid) enabled True.

3) Add F1, precision, recall metrics

For better evaluation, we add F1, precision, and recall metrics. To use these metrics, we need to convert the ratings and predictions to integers for evaluation. Ratings are multipled by 2, and now consists of [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]. Predictions are rounded to closest 0.5, and then multiplied by 2.

def round_to_0p5(list_nums):
""" Helper func to round nums to nearest 0.5, eg 1.45 -> 1.5 """
return np.round(np.array(list_nums)*2)/2

""" Below code is in train loop """
# For f1, precision, recall -> round preds to 0.5 and multiply by 2.
# This turns fractional values to integers. Eg 1.34 -> 1.5 -> 3
y_true = round_to_0p5(lpreds)*2
y_hat = np.array(lratings)*2

Traditionally, these F1, precision, and recall are used with binary (0/1) inputs. Here, we use extended versions that work with multi-class inputs. This is enabled by the “ average=’weighted’ “ argument. The ‘weighted’ option calculates the metric for each label [1–10], and then weights labels by their distribution. In essence, ‘weight’ helps mitigate data imbalance. Zero_division=0 sets the return value to 0 when there is a zero division.

f1 = f1_score(y_true, y_hat, average='weighted', zero_division=0)
precision = precision_score(y_true, y_hat, average='weighted', zero_division=0)
recall = recall_score(y_true, y_hat, average='weighted', zero_division=0)

4) Train model

AdamW optimizer and mean-squared loss (MSE) are used to train the model. We have moved some hyperparameters into a config (CFG) for ease of use.

CFG = {
'sigmoid': True,
'bias': True,
'init': True,
'lr': 0.005,
'num_epochs': 10,
}
opt = optim.Adam(mdl.parameters(), lr=CFG['lr'])
loss_fn = nn.MSELoss()
epoch_train_losses, epoch_val_losses = [], []

for i in range(CFG['num_epochs']):
train_losses, val_losses = [], []
mdl.train()
for xb,yb in dl_train:
xUser = xb[0].to(device, dtype=torch.long)
xItem = xb[1].to(device, dtype=torch.long)
yRatings = yb.to(device, dtype=torch.float)
preds = mdl(xUser, xItem)
loss = loss_fn(preds, yRatings)
train_losses.append(loss.item())
opt.zero_grad()
loss.backward()
opt.step()
lpreds, lratings = [], []
mdl.eval()
for xb,yb in dl_val:
xUser = xb[0].to(device, dtype=torch.long)
xItem = xb[1].to(device, dtype=torch.long)
yRatings = yb.to(device, dtype=torch.float)
preds = mdl(xUser, xItem)
loss = loss_fn(preds, yRatings)
val_losses.append(loss.item())
# Start F1, precision, recall calculation
lpreds.extend(preds.detach().cpu().numpy().tolist())
lratings.extend(yRatings.detach().cpu().numpy().tolist())
# Start logging
epoch_train_loss = np.mean(train_losses)
epoch_val_loss = np.mean(val_losses)
epoch_train_losses.append(epoch_train_loss)
epoch_val_losses.append(epoch_val_loss)
# For f1, precision, recall -> round preds to 0.5 and multiply by 2.
# This turns fractional values to integers. Eg 1.34 -> 1.5 -> 3
y_true = round_to_0p5(lpreds)*2
y_hat = np.array(lratings)*2
f1 = f1_score(y_true, y_hat, average='macro', zero_division=0)
precision = precision_score(y_true, y_hat, average='macro', zero_division=0)
recall = recall_score(y_true, y_hat, average='macro', zero_division=0)
s = (f'Epoch: {i}, Train Loss: {epoch_train_loss:0.1f}, '
f'Val Loss: {epoch_val_loss:0.1f}, F1: {f1:0.2f}, '
f'prec.: {precision:0.2f}, rec: {recall:0.2f}')
print(s)

5) Check results

With the model changes, the model has lower loss than Part 1:
Part1 val_loss: 3.3
Part2 val_loss: 0.9

Also, Part2 compared with Part1:
- user and item embedding min/max weights are closer to 0
- Part2 predictions are closer to actual ratings

Part2 Weights/Predictions/Sample 5 Predictions
Part1 Weights/Predictions/Sample 5 Predictions

References

  1. Fastai https://github.com/fastai/fastbook/blob/master/08_collab.ipynb

--

--

Daniel Lam

Machine learning. Pictures + code | MS EE | linkedin.com/in/dnylam/ | Creator of leetracer.com/screener - "LeetCode with Spaced Repetition"