[ Paper Summary ] Matrix Factorization Techniques for Recommender Systems

Jae Duk Seo
Towards Data Science
5 min readNov 22, 2018
Photo by Andrew Neel on Unsplash

Please note that this post is for my future self to look back and review the materials presented in this paper.

Paper from this website

Abstract

These days we are constantly being recommended from varieties of sources, such as what blog to read, what music to listen to etc….. And these recommendation systems are becoming more personalized than ever. In this paper the authors used matrix factorization technique to build a sophisticated recommender system in which outperformed nearest-neighbor techniques. (In the setting of movie recommendation system).

Recommender system strategies

Content Filtering → creates a profile for each user or product to characterize its nature (Success Case: Music Genome Project)

Collaborative Filtering → analyzes relationships between users and inter-dependencies among products to identify new user-item associations (Success Case: Tapestry)

Collaborative filtering is generally more accurate then content filtering however, it suffers from cold start problem. (If new user exist and does not have any inter-dependencies among others, we can’t recommend anything). In general there is two method to achieve Collaborative filtering.

Neighborhood Methods → computing the relationships between items or, alternatively, between users (user base)

Latent Factor → creates a latent feature to compare one user to another user (feature base)

Matrix factorization methods

When a user gives feed back to a certain movie they saw (say they can rate from one to five), this collection of feedback can be represented in a form of a matrix. Where each row represents each users, while each column represents different movies. Obviously the matrix will be sparse since not everyone is going to watch every movies, (we all have different taste when it comes to movies).

One strength of matrix factorization is the fact that it can incorporate implicit feedback, information that are not directly given but can be derived by analyzing user behavior. Using this strength we can estimate if a user is going to like a movie that (he/she) never saw. And if that estimated rating is high, we can recommend that movie to the user.

Image result for matrix factorization
Image from this website

The above image does an excellent job of summarizing, the core idea behind matrix factorization. Let there be matrix A with dimensionality of (m,n) this matrix can be viewed as a dot product between two matrix with each matrices having dimensions of (m,k) and (k,n).

Just as a side note, the above concept is heavily related to Singular Value Decomposition (SVD). One downside of SVD is the fact that when the original matrix is sparse (incomplete) left and right singular vectors are undefined.

The concept of matrix factorization can be written mathematically to look something like below.

Then we can create an objective function (that we want to minimize) with respect to q and p, which are (m,k) and (k,n) matrices.

The term on the right is the regularization term, this is added since we do not want our decomposed matrix q and p to over-fit to the original matrix. Since our goal is to generalize the previous ratings in a way that predicts future, unknown ratings, we should not over-fit our model.

Learning Methods

One obvious method to find matrix q and p is the gradient descent method. Since we have the loss function defined, take the partial derivative respect to q and p to optimize those values.

By taking partial derivatives, the update rule would look something like above. But the error surface is not convex, we can also take the alternative approach in which we fix q and p alternatively while optimizing for another.

Adding Bias

Some movies are bias in that it is widely perceived better (or worse) than other movies, and some users are bias in that they are super salty and never rate a movie greater than 2. These are known as biases or intercepts, and they are independent of any interactions, and it wouldn’t be wise to explain these bias terms using our two decomposing matrix q and p. So we include the bias terms into our original equation.

And the new objective function would look something like below.

Just like how we added additional bias term to the original function, we can add additional terms to counter the cold start problem. As well as incorporate temporal dynamics of the user and the items. (or even confident scores)

Final Words

I think it is super cool that we can use matrix factorization to build recommendation system.

Reference

  1. (2018). Datajobs.com. Retrieved 22 November 2018, from https://datajobs.com/data-science-repo/Recommender-Systems-[Netflix].pdf
  2. Music Genome Project. (2018). En.wikipedia.org. Retrieved 22 November 2018, from https://en.wikipedia.org/wiki/Music_Genome_Project
  3. Simple Matrix Factorization example on the Movielens dataset using Pyspark. (2018). Medium. Retrieved 22 November 2018, from https://medium.com/@connectwithghosh/simple-matrix-factorization-example-on-the-movielens-dataset-using-pyspark-9b7e3f567536

--

--

Towards Data Science
Towards Data Science

Published in Towards Data Science

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Jae Duk Seo
Jae Duk Seo

Written by Jae Duk Seo

Exploring the intersection of AI, deep learning, and art. Passionate about pushing the boundaries of multi-media production and beyond. #AIArt

Responses (5)