An Introduction to Collaborative Filtering Recommendation Systems

Felipe Passarella
Semantix
Published in
4 min readDec 28, 2021

Recommendation systems have become part of our daily online lives. What started around 20 years ago as a means to sell more books is now the daily driver of online shopping and social media. Those systems can be highly customized therefore each user is provided with a unique experience based on their past behavior.

Photo by Redd on Unsplash

In this post we’ll focus on Collaborative Filtering. We’ll use two models, an item-item based on cosine similarity and an user-item based on matrix decomposition. The first one will use only binary data, i.e. instead of using ratings from 1 to 5, we’ll be using only 1s and 0s. Ratings can be biased and there are lots of examples of data which is only fit by 1s and 0s, clicks on a webpage or product purchase data for example.

Our data

We will use the Movielens dataset, it contains information about movies, like their genre, release date, etc. It also has demographic information about users and the ratings (from 0 to 5) they’ve given to these movies.

Item-item approach

Some math first. We want to score how likely it is for an user to interact with an item. Let s(u,i) be our scoring function. then:

where ωij is a weight between items i and j and ruj is the rating the user has given to item j. N denotes the neighborhood of item i, basically items which the user interacted with. Since we’ll be dealing with 1s and 0s, we’ll assume that the user interacted with all the items in our dataset.

This expression generalizes well for any type of ratings but we can simplify it since the sum will be over the product with 1s and 0s, so we can remove variable r and focus on the weights.

Those weights can be any similarity metric, manhattan, jaccard, in our case we’ll use the cosine.

The cosine between two arrays is calculated as:

With some more algebra we can derive the Pearson Correlation from this expression, which means it makes sense from a statistical point of view. We’ll leave this and the combination of the two expressions presented here as an exercise.

The thing is we can normalize the expression and the final result of everything put together will be a square symmetric matrix with 1s in it’s diagonal. That is, the aij entry will be the cosine between the arrays from items i and j.

Let’s take a look at some code.

User-item approach

Now that we already know how to vectorize data and have a fair knowledge of similarity metrics such as cosine distance, let’s go deeper into a more sophisticated algorithm which actually uses machine learning.

Suppose we have our one-hot encoded data embedded into a matrix A such that A ∈ ℝ^(m x n) where m is our number of users and n our number of items. Now we want to find a way to relate users with items.

Let us decompose the matrix A into

where U ∈ ℝ^{m × d} and V ∈ ℝ^(n x d) , d is the size of the desired feature vector.

Note that the matrix A is very sparse, so we can write the decomposition as a weighted function:

We are minimizing the square distance between the matrices such that is our observations set, i.e., i,j ∈ Ω if ai,j = 1. By introducing the weight ω0 we are assuming that the information from the ones might be more relevant to us than the information for the zeroes.

So this optimization problem can be solved as any we are used to by using Stochastic Gradient Descent (SGD). but this particular quadratic problem can also be solved by a method called Alternating Least Squares (ALS). We’ll not focus on these details.

There are lots of metrics we can explore within this problem, we’ll be focusing on MSE here but Serendipity, Coverage and Personalization are extensively used as well.

When we finally find adequate matrices U and V we can use cosine distance to calculate similarity from user to item by calculating the cosines between the matrices rows because U will represent the users while V will represent our items.

Let’s take a look at some code.

Conclusion

The recommendations area is very extensive, we’ve seen two simplified models which deal with two different scenarios. They generalize well but there is more work to be done.

Recently added data like a new user or a fresh item without consumer feedback can cause some headaches. This problem is referred to as “The Cold-Start Problem”, there are very clever workarounds.

There are different scenarios to explore and improve, lots of features can also be manually implemented, the personalization options are many and that’s the tricky part of recommendation systems, you aim to build a special showcase for each one of your customers.

References

[1] Google Recommendation Systems Course (https://developers.google.com/machine-learning/recommendation/)

[2] Coursera Collaborative Filtering course (https://www.coursera.org/learn/collaborative-filtering)

--

--

Felipe Passarella
Semantix
Writer for

I’m a bachelor in Applied Mathematics who works with Data Science. My fields of interest are NLP and Computer Vision.