Recommendation systems in about 100 lines of code

Stefan Mićić
4 min readJan 25, 2022

--

In this article, I am going to present four easy movie recommendation system implementations. One will be using Singular Value Decomposition (SVD) and the other three will be using tensorflow. I will show you how easy it is to implement a ranking and retrieval system and moreover how to include feature preprocessing.

Part 1: SVD

If you just have information about user and movie id along with rating there is nothing to be done except collaborative filtering. Collaborative filtering is a technique where the goal is to find the most similar user/movie to the selected one. To understand this lets go through an example.

In the table below you can see a table representing ratings which are given to movies by the user. Symbol x means that a certain user hasn’t seen a certain movie yet.

Movie ratings

In the table above you can see that user_1 and user_2 are pretty similar when it comes to rating a movie. We can suppose that user_1 will probably rate movie_2 the same as user_2. By looking at the rows, we can conclude that movie_2 and movie_3 are usually liked by the same users. So, when we have to recommend movies to a user, we can find the most similar movies to ones that user liked and recommend them. This is an easy example of collaborative filtering.

To implement this you just need one line of code. Numpy.linalg.svd has to be called with appropriate data and as a result we will receive three matrices. For us, the first and the last are more meaningful. The first represents user embeddings and the second movie embeddings. Having this it is easy to find similar movies using cosine similarity or any other metrics you want. The code is shown below.

Part 2: Retrieval

The goal of this part is to create a model that will be able to find a set of movies for a certain user. First we have to prepare data.

Now that we have the data, let’s create a model. The model requires two sub-models, for user and movie embeddings. When that is done, the only thing we need is to define a task we would like to solve. Fortunately, tensorflow has a retrieval task implemented and we just have to use it as is.

When you have your model, you can do an inference using code below.

Part 3: Ranking

When you have a batch of movies that would be suitable for a user, it would be good to sort them according to users preferences. In order to do this you have to change the previous model so that it solves a ranking task instead. Apart from that, a couple of dense layers should be added since we expect a number as an output.

Again, inference is quite easy.

Part 4: Retrieval using feature preprocessing

When rich data about movies or users is available it would be beneficial to include it in predictions. In this example I am going to show how you can make use of text data, numbers and dates. The emphasys here will be on overview, budget, vote_average and release_date fields. Below you can see data preparation with additional detail. Maximum and minimum vote is found so that it can be used for discretization later in the model.

Now it is time for the most interesting part. Different preprocessing pipelines can be seen in the MovieModel. Apart from standard embedding which we already saw in former parts, there are a couple new ones. First one is text vectorization followed by embedding and global average pooling. The reason for this is to tokenize movie overview and use an embedding layer to create useful information about movies based on description. Discretization and normalization are also good techniques when dealing with numbers. The last new thing is hashing. I used it in order to make use of dates. Even better solution would be to create our own hashing algorithm to map dates to some year/month bucket related, but this is good enough for the purpose of this article. When we have all of these embeddings, we have to pass them concatenated through a dense layer to align the output dimension with the output of the UserModel. The code of models with training is shown below.

Conclusion

In this article, I demonstrated how you can easily train model for recommendation. I hope you will find it useful!

Kindly like, comment and share if you liked this article. Your feedback is welcome!

--

--