How Does the Funk Singular Value Decomposition Algorithm work in Recommendation Engines?

I will begin this blog post by asking a simple question:

What if you would like to get a sense of how well your recommendation engine works before deploy it and use it in a real situation?

In order to answer this question, we need a machine learning technique that become extremely popular called Singular Value Decomposition or SVD.

SVD can allow us to predict a rating for every user-item pair. So you might ask yourself, why do we need to predict ratings? We just want to make recommendations. In fact, if we can predict ratings with low error, we can use this predicted rating in order to find the item related to the highest rating predicted. And this is where SVD can be useful because it allows us to use regression-based metrics like MSE or MAE to assess performance. In this way understand the metric before deploying our recommendations to customers.

When performing SVD, we create a matrix of users by items, with user ratings or something to evaluate the items throughout the matrix. This matrix has no specific information about the user or the item, just their respective ids and ratings associated with. Using SVD on this matrix can allow us to find latent features related to the items and users.

What is a latent factor?

A latent factor is not observed in our data, rather, we infer it based on the value (ratings) given by the user to the item. Let’s say that a user gives a score of 9/10 to two movies talking about AI et a score of 2/10 to two movies talking about animals. There is a relation between the score and the content of the movies (AI and dogs). But AI and dogs aren’t present in our data, these features are called Latent features.

In real life, SVD doesn’t work so well on data. Why?

Because SVD can achieve a very great result on non-sparse data, I mean on data with no missing values. But how many of us give a score at the end of each movie or purchase? Hmm, not a lot.

In fact, a real-life dataset for recommendation engine can be very sparse, you can find user-item matrices with more than 95% NaN values. That where Funk SVD comes in.

How does Funk SVD algorithm work?

Funk SVD will ignore these missing values and find a way to compute latent factors only using the values we know. To achieve this approach of matrix factorization with Funk SVD, this is the steps to follow:

1Build 2 matrices U and V^T, respectively a matrice of users by the number of latent factors chosen and a matrice of these same latent factors by items and fill these matrices with random numbers.

At this step, we have 3 matrices:

The User-Item matrix

The U matrix (User by latent factors filled with random values)

The V^T matrix (Latent factors by Items filled with random values)

2 Search in the User-Item matrix an existing rate for one user-item pair. The first rating found is 9, given by User 1 to item 3. So 9 is our actual value, the true value.

3 ​In the U matrix, we take all the random values associated with User 1 (row). We have for this user [0.8, 1.2, -0.2]

Remember, the 9 was given by user 1 to item 3. In the V^T matrix, we take also all the random values associated with item 3. We have for this item [-0.2, 0.1, 0.14]^T

4 ​We compute the dot product between the row and column found in order to make our prediction. (0.8 x -0.2) + (1.2 x 0.1) + (-0.2 x 0.14) = -0.07

At this moment, we have: Actual=9 ; Predicted=-0.07

So the error is (9+0.07)² =82.26

5 ​Minimize the error using Gradient descent.

Formula: U(i) or V(i) + ⍺ 2(actual-predicted) x V(i) or U(i) where U(i) is the random value from the U matrix, V(i) the random value associated in the V^T matrix and ⍺ the learning rate.

Let’s recap, this is the values taken from U and V and make predictions:

Update 0.8 using Gradient descent and ⍺=0.1:

New value = 0.8 + 0.1 x 2(9+0.07) x -0.2 = 0.44

By updating all the values from U, we obtain 0.44, 1.38, 0.053. So now we have:

We can then update the values from V. Notice that the new updated values from U are already impacting the values that we will compute from V.

Finally, with the same formula we have:

6 ​Replace the updated values in the U and V^T matrix

And that close one iteration.

Conclusion

Funk SVD easily allow us to find a way to evaluate our recommendation engine and to create good U and V^T matrices in order to make recommendations even if we have a very sparse matrix. But Funk SVD shouldn’t be used alone, because we can be faced with a common problem in recommendation engine called “Cold Start Problem”. This problem means, that we can’t make a recommendation for new users or new movies. A good approach is to combine Funk SVD with a less advanced method like a Ranked Based algorithm or Content-based. You can have an idea of how to implement all of these techniques in Python by downloading my Recommendation engine module on Github.