Matrix Factorization Advanced: Pictures + Code (PyTorch) — (Part 2)

5 min readMar 25, 2023

TLDR:
Problem: Can we improve matrix factorization by adding 1) user & item biases 2) offset 3) weight initialization 4) sigmoid_range? For basic info on matrix factorization and the movielens dataset, please refer to part 1 in the link below.
Dataset: ml-latest-small.zip from https://grouplens.org/datasets/movielens/
Data consists of users, movies, ratings, timestamps, titles, and genres.
Solution (2 parts):
1) Basic matrix factorization https://medium.com/@datadote/pytorch-matrix-factorization-pictures-code-part-1-abe331317ffb
2) Advanced matrix factorization (bias terms, offset, weight initialization, sigmoid_range)
Code: “02_matrix_fact_advanced.ipynb” https://github.com/Datadote/matrix-factorization-pytorch

Steps:
1) Problem + Dataset Recap
2) Matrix Factorization improvements
3) Add F1, precision, recall metrics
4) Train model
5) Check results

1) Problem + Dataset Recap (Preprocessed in part 1)

Problem: Given a dataset of users, movies, and ratings. Can we create a model that predicts movie ratings for users?
Dataset: ml-latest-small.zip from https://grouplens.org/datasets/movielens/
Data consists of users, movies, ratings, timestamps, titles, and genres.

After label encoding, userId and movieId have different values from earlier

2) Matrix Factorization improvements

From part 1, the basic matrix factorization is a dot product of a user embedding and an item embedding. To improve performance, add 1) user & item biases 2) offset 3) weight initialization 4) sigmoid_range.

User & item biases: some users rate movies, on average, higher than other users. Item biases have a similar idea. For example, higher quality items might be rated higher, on average, than low quality items.
Offset: common technique to add a global offset
Weight initialization: nn.Embedding, by default, is initialized with a normal distribution(0, 1). Change the weight initialization to uniform between [0, 0.05]. This empirically performs better than the defaults.
Sigmoid_range: Idea taken from Fastai collab notebook. Clamp outputs between [0, 5.5] Empirically, clamping at 5.5 performs better than at 5.

Matrix Factorization + user/item biases, offset, sigmoid_range

class MFAdvanced(nn.Module):
    """ Matrix factorization + user & item bias, weight init., sigmoid_range """
    def __init__(self, num_users, num_items, emb_dim, init, bias, sigmoid):
        super().__init__()
        self.bias = bias
        self.sigmoid = sigmoid
        self.user_emb = nn.Embedding(num_users, emb_dim)
        self.item_emb = nn.Embedding(num_items, emb_dim)
        if bias:
            self.user_bias = nn.Parameter(torch.zeros(num_users))
            self.item_bias = nn.Parameter(torch.zeros(num_items))
            self.offset = nn.Parameter(torch.zeros(1))
        if init:
            self.user_emb.weight.data.uniform_(0., 0.05)
            self.item_emb.weight.data.uniform_(0., 0.05)
    def forward(self, user, item):
        user_emb = self.user_emb(user)
        item_emb = self.item_emb(item)
        element_product = (user_emb*item_emb).sum(1)
        if self.bias:
            user_b = self.user_bias[user]
            item_b = self.item_bias[item]
            element_product += user_b + item_b + self.offset
        if self.sigmoid:
            return sigmoid_range(element_product, 0, 5.5)
        return element_product

n_users = len(df.userId.unique())
n_items = len(df.movieId.unique())
mdl = MFAdvanced(n_users, n_items, emb_dim=32,
                 init=CFG['init'], # CFG=True
                 bias=CFG['bias'], # CFG=True
                 sigmoid=CFG['sigmoid'], # CFG=True
)

By default, MFAdvanced has initialization (init), bias, and sigmoid_range (sigmoid) enabled True.

3) Add F1, precision, recall metrics

For better evaluation, we add F1, precision, and recall metrics. To use these metrics, we need to convert the ratings and predictions to integers for evaluation. Ratings are multipled by 2, and now consists of [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]. Predictions are rounded to closest 0.5, and then multiplied by 2.

def round_to_0p5(list_nums):
    """ Helper func to round nums to nearest 0.5, eg 1.45 -> 1.5 """
    return np.round(np.array(list_nums)*2)/2

""" Below code is in train loop """
# For f1, precision, recall -> round preds to 0.5 and multiply by 2.
# This turns fractional values to integers. Eg 1.34 -> 1.5 -> 3
y_true = round_to_0p5(lpreds)*2
y_hat = np.array(lratings)*2

Traditionally, these F1, precision, and recall are used with binary (0/1) inputs. Here, we use extended versions that work with multi-class inputs. This is enabled by the “ average=’weighted’ “ argument. The ‘weighted’ option calculates the metric for each label [1–10], and then weights labels by their distribution. In essence, ‘weight’ helps mitigate data imbalance. Zero_division=0 sets the return value to 0 when there is a zero division.

f1 = f1_score(y_true, y_hat, average='weighted', zero_division=0)
precision = precision_score(y_true, y_hat, average='weighted', zero_division=0)
recall = recall_score(y_true, y_hat, average='weighted', zero_division=0)

4) Train model

AdamW optimizer and mean-squared loss (MSE) are used to train the model. We have moved some hyperparameters into a config (CFG) for ease of use.

CFG = {
    'sigmoid': True,
    'bias': True,
    'init': True,
    'lr': 0.005,
    'num_epochs': 10,
}

opt = optim.Adam(mdl.parameters(), lr=CFG['lr'])
loss_fn = nn.MSELoss()
epoch_train_losses, epoch_val_losses = [], []

for i in range(CFG['num_epochs']):
    train_losses, val_losses = [], []
    mdl.train()
    for xb,yb in dl_train:
        xUser = xb[0].to(device, dtype=torch.long)
        xItem = xb[1].to(device, dtype=torch.long)
        yRatings = yb.to(device, dtype=torch.float)
        preds = mdl(xUser, xItem)
        loss = loss_fn(preds, yRatings)
        train_losses.append(loss.item())
        opt.zero_grad()
        loss.backward()
        opt.step()
    lpreds, lratings = [], []
    mdl.eval()
    for xb,yb in dl_val:
        xUser = xb[0].to(device, dtype=torch.long)
        xItem = xb[1].to(device, dtype=torch.long)
        yRatings = yb.to(device, dtype=torch.float)
        preds = mdl(xUser, xItem)
        loss = loss_fn(preds, yRatings)
        val_losses.append(loss.item())
        # Start F1, precision, recall calculation
        lpreds.extend(preds.detach().cpu().numpy().tolist())
        lratings.extend(yRatings.detach().cpu().numpy().tolist())
    # Start logging
    epoch_train_loss = np.mean(train_losses)
    epoch_val_loss = np.mean(val_losses)
    epoch_train_losses.append(epoch_train_loss)
    epoch_val_losses.append(epoch_val_loss)
    # For f1, precision, recall -> round preds to 0.5 and multiply by 2.
    # This turns fractional values to integers. Eg 1.34 -> 1.5 -> 3
    y_true = round_to_0p5(lpreds)*2
    y_hat = np.array(lratings)*2
    f1 = f1_score(y_true, y_hat, average='macro', zero_division=0)
    precision = precision_score(y_true, y_hat, average='macro', zero_division=0)
    recall = recall_score(y_true, y_hat, average='macro', zero_division=0)
    s = (f'Epoch: {i}, Train Loss: {epoch_train_loss:0.1f}, '
         f'Val Loss: {epoch_val_loss:0.1f}, F1: {f1:0.2f}, '
         f'prec.: {precision:0.2f}, rec: {recall:0.2f}')
    print(s)