More than a bunch of stars

RecSys Week 3: A recommender system that builds a characterization of the users and the items.

Sebastian Amenabar
Aug 28, 2017 · 4 min read
Source: https://www.shutterstock.com/video/clip-25102412-stock-footage-seamlessly-looping-background-animation-of-interconnected-particle-streams.html?src=rel/19400032:2

Things usually aren’t just black or white, good or bad, they have shades and characteristics that result in advantages or disadvantages depending on the context. For example, a movie featuring Brad Pitt shooting zombies won’t certainly be good in the eyes of a romance lover, but it may to an action fan.

To take this into account when generating recommendations for a person, can make them far more accurate. But when looking at the data, how could we be able to easily work with characteristics as comedy or horror, and make mathematical calculations to automatize the process? Maybe we could make a score, representing how much of sci-fi a movie contains, like a percentage between 0 and 1. But now, another problem arises, how do we estimate these values?

The authors describe a method which kind of does this automatically. Imagine that each movie is represented with a vector, and each component represent a genre, for example, for Interestellar the vector could be [0.8, 0.7], representing Sci-fi and drama, in that order. Now, to decide how much I’ll like the movie, I have a vector for myself, for example, [0.9, 0.6], which means I really like Sci-fis and dramas not as much. Finally, to get a score, we make a dot product between the vectors, 0.9*0.8 + 0.6*0.7 = 1.14. The magnitude of the result is not important for this example, it must be standardized to match it with a rating scale.

With the example above, the problem has just duplicated, now we need also a descriptor for the user, but all can be achieved with the method. We basically change the values on both the vectors, for the users and movies (items in general), and try to minimize the difference between the real rating the user gave to the movie, and the one generated with the built vectors.

It is important to remark that the authors include other parameters that can produce an important improvement in the precision of the predictions, but, in my interest to keep the explanation simple, I’ll just write about what they represent and not about the implementation in the method.

First, there is a parameter to regularize overfitting, that tries to increase the error when the prediction system becomes overly specific. There are also the item (movies in this case) and user biases, representing that usually an item has good ratings, or an user gives bad ratings. And all the conditions above may change during the time so making them time-varying also can improve the performance.

Finally, and what consider the most important part of the paper is the possibility to include implicit feedback to the calculations. These are metrics like the searches or clicks of the users, that may represent their interests and usually aren’t as sparse as the rating matrix, since they do not require any active rating of the users. These methods of recommendation are becoming mainstream, with Amazon.com making recommendations based on just the items you looked, or Facebook considering your searches for the ads, a far more user-personalized experience is being built.

Now this also helps with the cold start problem, even though it may not result as accurate as real ratings, it helps to generate recommendations to new users which may take a while to make their first ratings, or other users reluctant to provide ratings. And I think that another use which was not clearly added, but is related to the temporal dynamics, is the possibility to manually modify the descriptor of each movie. This could help to recommend novel items, even though the prediction of the rating may have a big error, it will still be able to be listed on recommended ranking for the users.

Finally, I think a specification for the memory and computational complexity this method has. The real time use must be extremely fast, since it’s only the cross product of two vectors, but how expensive is it to build those vectors? It must be one per user and item, some techniques of clustering may be used, but it certainly sounds like a heavy computation, with a big number of iterations for each vector. And since all the descriptors are calculated as a whole, in which ways can we make a quick estimation of the vector for a new product or user without having to compute everything again?

Probably when the authors developed the method they did not anticipate the huge amount of data actual recommendations would require, so this problem was not that considerable then, but it is something which must be dealt with now.


Thank you for reading, any correction, commentary or related reading I’ll thankfully accept. This post is a commentary on the paper Matrix Factorization Techniques for Recommender Systems (2009), written by Yehuda Koren, Robert Bell and Chris Volinsky. This is a weekly post for the Recommender System course (IIC3633), Pontifical Catholic University of Chile.

)
Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade