[ Paper Summary ] Matrix Factorization Techniques for Recommender Systems
Please note that this post is for my future self to look back and review the materials presented in this paper.
Abstract
These days we are constantly being recommended from varieties of sources, such as what blog to read, what music to listen to etc….. And these recommendation systems are becoming more personalized than ever. In this paper the authors used matrix factorization technique to build a sophisticated recommender system in which outperformed nearest-neighbor techniques. (In the setting of movie recommendation system).
Recommender system strategies
Content Filtering → creates a profile for each user or product to characterize its nature (Success Case: Music Genome Project)
Collaborative Filtering → analyzes relationships between users and inter-dependencies among products to identify new user-item associations (Success Case: Tapestry)
Collaborative filtering is generally more accurate then content filtering however, it suffers from cold start problem. (If new user exist and does not have any inter-dependencies among others, we can’t recommend anything). In general there is two method to achieve Collaborative filtering.
Neighborhood Methods → computing the relationships between items or, alternatively, between users (user base)
Latent Factor → creates a latent feature to compare one user to another user (feature base)
Matrix factorization methods
When a user gives feed back to a certain movie they saw (say they can rate from one to five), this collection of feedback can be represented in a form of a matrix. Where each row represents each users, while each column represents different movies. Obviously the matrix will be sparse since not everyone is going to watch every movies, (we all have different taste when it comes to movies).
One strength of matrix factorization is the fact that it can incorporate implicit feedback, information that are not directly given but can be derived by analyzing user behavior. Using this strength we can estimate if a user is going to like a movie that (he/she) never saw. And if that estimated rating is high, we can recommend that movie to the user.
The above image does an excellent job of summarizing, the core idea behind matrix factorization. Let there be matrix A with dimensionality of (m,n) this matrix can be viewed as a dot product between two matrix with each matrices having dimensions of (m,k) and (k,n).
Just as a side note, the above concept is heavily related to Singular Value Decomposition (SVD). One downside of SVD is the fact that when the original matrix is sparse (incomplete) left and right singular vectors are undefined.
The concept of matrix factorization can be written mathematically to look something like below.
Then we can create an objective function (that we want to minimize) with respect to q and p, which are (m,k) and (k,n) matrices.
The term on the right is the regularization term, this is added since we do not want our decomposed matrix q and p to over-fit to the original matrix. Since our goal is to generalize the previous ratings in a way that predicts future, unknown ratings, we should not over-fit our model.
Learning Methods
One obvious method to find matrix q and p is the gradient descent method. Since we have the loss function defined, take the partial derivative respect to q and p to optimize those values.
By taking partial derivatives, the update rule would look something like above. But the error surface is not convex, we can also take the alternative approach in which we fix q and p alternatively while optimizing for another.
Adding Bias
Some movies are bias in that it is widely perceived better (or worse) than other movies, and some users are bias in that they are super salty and never rate a movie greater than 2. These are known as biases or intercepts, and they are independent of any interactions, and it wouldn’t be wise to explain these bias terms using our two decomposing matrix q and p. So we include the bias terms into our original equation.
And the new objective function would look something like below.
Just like how we added additional bias term to the original function, we can add additional terms to counter the cold start problem. As well as incorporate temporal dynamics of the user and the items. (or even confident scores)
Final Words
I think it is super cool that we can use matrix factorization to build recommendation system.
Reference
- (2018). Datajobs.com. Retrieved 22 November 2018, from https://datajobs.com/data-science-repo/Recommender-Systems-[Netflix].pdf
- Music Genome Project. (2018). En.wikipedia.org. Retrieved 22 November 2018, from https://en.wikipedia.org/wiki/Music_Genome_Project
- Simple Matrix Factorization example on the Movielens dataset using Pyspark. (2018). Medium. Retrieved 22 November 2018, from https://medium.com/@connectwithghosh/simple-matrix-factorization-example-on-the-movielens-dataset-using-pyspark-9b7e3f567536