Matrix Factorisation — Part 7

Published in

Fnplus Club

6 min readJul 29, 2019

MODEL BASED METHODS

Instead of finding items or users similar to each other, we apply apply Machine Leaning and Data Science in model based methods. Basic steps involved are :

Training our model with data
Predicting using model

MATRIX FACTORISATION

It has many sub categories of techniques.
CREEPY! Manages to find broader feature of users/items on their own (eg. Action or Romantic). Math doesn’t know what to call newly found features which are simply described by matrices.

MAIN IDEA:

Describe users and items as combinations of different amounts of each other.

For example, we may say Bob likes action movies as well as comedy movies. The same can told as

Bob = 50% Action +20% Comedy

Matching movies for recommendations with these attributes is a good idea for predictions

PRINCIPAL COMPONENT ANALYSIS(PCA) ON MOVIE RATINGS

PCA is a statistical procedure which reduces the dimension of our user-item matrix without losing any important information!

It is used as,

Dimentionality reduction tool(Can’t say what newly-reduced-dimensions mean)
Feature extraction tool

Note: Often, the dimensions they find correspond to features humans have learnt to associate with items (eg. How ‘action-ey’ a movie is)

Whatever it is about movies that causes individuals to rate them differently, PCA extracts and finds those ‘Latent Features’

If R (Orignal Matrix) has some missing values, we can reconstruct R by filling blank values!!

SIGMA MATRIX

It is a simple diagonal matrix only used to scale the values we get into proper scale -

We can multiply this sigma matrix with M or U and still R will just be product of two matrices
Reconstruct R to get ratings of all users for all items!
We can even predict ratings using dot products of using recunstruction formula described above

SINGULAR VALUE DECOMPOSITION (SVD)

A way of computing M, Sigma, U-Transpose all at once efficiently and get all three factors in one shot!

Very accurate results
Used widely during Netflix prize competition

But wait! Just before applying PCA to original matrix R, how to deal with missing values?

You may fill missing values with averages of some sort… but there is a better way!

Assume we have some ratings for any given row/column in M and U-Transpose

We can treat this as ‘minimisation of profits’ problem
Find values of those complete rows and columns that best minimizes errors in known ratings R
Use Stochastic Gradient Descent (Just one way to do it)
Apache Spark uses a different technique called ALS (Alternating Least Squares)

Note: You might be confused here because we are talking about learning the values as matrices in M and U.
Actually, we are predicting ratings and not computing them directly which is what SVD does. We are not doing real SVD Recommendations because SVD can’t perform on missing data. Hence, it is ‘SVD-Inspired-algoithm’ not SVD itself. Winner of Netflix prize was a variation of SVD called SVD++

TIPS:

You can see source code for SVD algrithm used in surpiselib in github. But too complex to understand.
Never write your own algorithm. Odds are too high that you will mess up somewhere. Use third party library that has been used and validated by others.

SVD v/s SVD++

Actual loss function used during SGD
In SVD++, merely rating item at all is an indication of some sort of interest in the item. No matter what the rating was!
SVD++ outperformed SVD
Worked really well!!
It is Spooky because we don’t understand the way latent features work

GIVEN, THE USAGE OF SVD IN NETFLIX COMPETITION AND IT’S SUCCESS; A LOT OF RESEARCH FOCUSED ON SVD. THE BELOW
LISTED ONES ARE JUST SOME OF THE ALGORITHMS THAT WERE DEVELOPED

I . Factorisation Matrices

Well suited for predicting ratings/clicks in Recommendation System
More ‘General Purposed’
Handles sparse data with ease. Unlike shoehorning itself into problem like SVD
Available in Amazon Sage
Only downsize — WORKS ONLY WITH CATEGORICAL DATA!

II. TimeSVD++

Used for predicting next item in series of events.

III. Factorized Personalized Markov Chains

Same as above. Used for predicting next item in series of events.

IV. PLSA (Promising!)

Extracts latent features from content itself
Use PLSA on ‘movie titles’ and ‘description’ and match them up with users
As they are content based, they don’t do well on their own. Add it with user behaviour data.

HYPERPARAMETER TUNING(SVD)

Hyperparameters are the parameters chosen by the ML engineer himself in order to recommend with effective predictions. Finding optimal hyperparameters is called hyperparameter tuning.

Big problem in ML in general
A lot of algorithms are very sensitive to parameters such as Learning rates
Hyperparameters vary with datasets

In case of SVD, Hyperparameters are —

How many latent factors to extract?
How many dimensions to reduce to?

Note: Both of them depend on the nature of data we have.

Example Code

The below code helps us understand how we can tune hyperparameters in surpriselib using GridSearchCV package.

GridSearchCV package,

Helps in hyperparameter tuning
Defines grid of different parameters you want to try different values for (try every possible combination)
‘param_grid’ defines what to test
‘measures’ define how to test
‘cv’ set number of folds/cross_validation

SLIM — Sparse Linear Methods (2011)

MAIN IDEA: ‘Sparse aggregation’ of other things user has rated multiplied by ‘sparse aggregation’ of weights (magic!)

Exiting results!
Outperformed all other recommendations (Hit Rates in TopN Recommendations)
Not only applicable to movies but books, music, credit cards and many more as well!
Because user has rated only some items and weights only exist for these items can be calculated — ‘sparse’ aggregation
Extend to entire user item rating matrix with the formula below

HOW WEIGHTS ARE COMPUTED ?

Complicated
So complicated that many can’t implement it in code
Similar to SVD but requires higher mathematical concepts like
— Matrix Frobenius Norm
— L1 Norm Regularisation
If SLIM comes your way, give it a second look — Promising Stuff!

Matrix Factorisation — Part 7

Table of Contents :

MODEL BASED METHODS

MATRIX FACTORISATION

PRINCIPAL COMPONENT ANALYSIS(PCA) ON MOVIE RATINGS

SIGMA MATRIX

SINGULAR VALUE DECOMPOSITION (SVD)

TIPS:

SVD v/s SVD++

GIVEN, THE USAGE OF SVD IN NETFLIX COMPETITION AND IT’S SUCCESS; A LOT OF RESEARCH FOCUSED ON SVD. THE BELOW
LISTED ONES ARE JUST SOME OF THE ALGORITHMS THAT WERE DEVELOPED

I . Factorisation Matrices

II. TimeSVD++

III. Factorized Personalized Markov Chains

IV. PLSA (Promising!)

HYPERPARAMETER TUNING(SVD)

Example Code

SLIM — Sparse Linear Methods (2011)

HOW WEIGHTS ARE COMPUTED ?

NEXT>>>

Written by Rakesh4real

Matrix Factorisation — Part 7

Table of Contents :

MODEL BASED METHODS

MATRIX FACTORISATION

PRINCIPAL COMPONENT ANALYSIS(PCA) ON MOVIE RATINGS

SIGMA MATRIX

SINGULAR VALUE DECOMPOSITION (SVD)

TIPS:

SVD v/s SVD++

GIVEN, THE USAGE OF SVD IN NETFLIX COMPETITION AND IT’S SUCCESS; A LOT OF RESEARCH FOCUSED ON SVD. THE BELOWLISTED ONES ARE JUST SOME OF THE ALGORITHMS THAT WERE DEVELOPED

I . Factorisation Matrices

II. TimeSVD++

III. Factorized Personalized Markov Chains

IV. PLSA (Promising!)

HYPERPARAMETER TUNING(SVD)

Example Code

SLIM — Sparse Linear Methods (2011)

HOW WEIGHTS ARE COMPUTED ?

NEXT>>>

Written by Rakesh4real

GIVEN, THE USAGE OF SVD IN NETFLIX COMPETITION AND IT’S SUCCESS; A LOT OF RESEARCH FOCUSED ON SVD. THE BELOW
LISTED ONES ARE JUST SOME OF THE ALGORITHMS THAT WERE DEVELOPED