Movie Recommendation System with Neural Networks and Collaborative Filtering (Explicit Feedback)

Bhawna Paliwal
7 min readSep 25, 2020

--

Finding out a highly recommended option, source: google images

This article is based on the research project for the course of Applied AI. The code and paper with details are available here.

1. What are Recommendation Systems?

Automated Recommendation Systems are now being used in a variety of fields from movies and songs to online shopping where you are recommended some items based on your past activity (ratings, purchased items, etc) and various other factors such as users having a similar activity as yours. Let’s get to how we describe a problem of Recommending items to someone mathematically for developing an algorithm/model.

Fig 1: A Diagrammatic Representation of Movie Rating Matrix

Fig 1 shows a rating matrix where matrix [i][j] gives how the user ‘i’ in i-th row has rated movie or item ‘j’ in j-th column on a scale of 5. This type of feedback with explicit ratings is referred to as Explicit Feedback whereas Implicit Feedback is generally Binary in terms of whether the user has interacted with this content.

The task is to somehow estimate the ‘?’ marked values in the matrix i.e. values that are unknown to us (unobserved). These values if estimated can then be used to recommend the relevant movie to a user if our estimates come out to be high such as a rating of 4 or 5. So, the problem is to ‘predict’ unobserved values in the partially filled Rating matrix ‘R’.

2. Traditionally Used Techniques for Recommendation Systems

2.1. Collaborative Filtering

Collaborative Filtering is a memory-based technique which is used to predict user-item rating on the basis of their neighborhood. Here, the neighborhood of a user refers to users who have rated the items similarly, and hence can be called a similar group of users. The neighborhood can be defined over users as well as items. Here, I am going to describe collaborative filtering over user neighborhood.

Let ‘Iu’ be the ratings of the set of items for which the user ‘u’ has rated. The similarity between two users ‘u’ and ‘v’, Sim(u,v) can be computed over the set of items ‘Iu’^‘Iv’ (Intersection over set Iu and Iv). For computing the similarity, the most commonly used coefficient is the Pearson Coefficient defined below:

Similarity between user u and v using Pearson Coefficient

Here, rij is the rating value of user i to item j and mu is the mean rating for each user to ensure normalization of the ratings. Normalization of ratings is done to avoid the bias in the ratings of users. As some users rate items very liberally (say, always close to 5) while others might always rate items strictly (say, always close to 3).

Now, to find out the rating of user ‘i’ for item ‘j’, we take the top K most similar users to user ‘i’ based on Pearson Similarity. The weighted average of the ratings of these K users for item ‘j’ is used to obtain the rating of user ‘i’ for item ‘j’. The weights are the similarity values, i.e. a user who is less similar to user ‘i’ is given less weight in the prediction of the rating. This is because the rating prediction of a user who is interested in the action genre of movies, should not be affected by a user who prefers the comedy genre as if taken into account, this might result in wrong predictions.

2.2. Matrix Factorization

The idea is to determine the latent representations of both the row space and the column space simultaneously. Transform the m × n ratings matrix R into a lower-dimensional space i.e. find matrices U and P such that R = U*P (T ) (transpose). The resulting matrix U is of size m × d, where d ≪ n which can be used to predict ratings of the target user.

Fig 2: Factorization of matrix R, Source: here

3. Neural Networks for Recommender Systems

Deep Neural Networks have achieved great success in a variety of prediction and classification tasks. Our work aimed to explore the use of neural networks for Recommendation tasks. We first tried to build a basic neural network with the task of predicting the unobserved Movie Ratings on our dataset described in section 4. The model is described in Fig 3 below. This model is referred to as Approach 1 in this article and our report.

Fig 3: Architecture of Model for Predicting Ratings from Keras User and Movie Embeddings

The model comprises an embedding layer of dimension 50 each for users as well as movies. The embedding layers are then flattened. A dropout layer of drop-out factor 0.2 is used for both users and movies. We first use a matrix product and flatten layer followed by a fully connected layer and pointwise loss function RMSE with Adam Optimizer. This approach gave us a mean absolute error of 0.94 which is far better than what achieved through traditional matrix factorization and collaborative filtering. This motivated us to use neural networks for the use of predicting the interaction between user and item (movies) latent factors.

Having derived the idea from the approach suggested in Neural Collaborative Filtering[2], we develop an architecture that involves Matrix Factorization and Multi-Layer Perceptron followed by a neural network. The layers obtained from both these parts of the model (matrix factorization and neural network over the user and item layer embeddings) are concatenated. This concatenated layer is followed by an output layer followed by a ReLu activation function. This model is referred to as Approach 2 in this article and our report.

Fig 4: Approach 2 Model with Neural Network built over Latent Matrix Factors and User and Item Embeddings

With a learning rate of 1e-4 and parameter K = 10, the model gave an MAE of 0.818. The Multi-Layer Perceptron model comprises of layers with sizes 64, 32, 16, and 8. The mean absolute error achieved through this approach (0.818) is a further improvement over the earlier approach in which we didn’t incorporate Latent Matrix Factors.

4. Dataset Used

The classic MovieLens dataset has been used for predicting ratings. The number of movies and users in the dataset is 9745 and 600 respectively. There is a random disparity between the number of ratings with values 5, 4, 3, 2, and 1 as shown in Fig 5 below. The test set comprises of 20168 ratings to be predicted for 600 users for a given set of movies. The problem here like any other real-world recommendation dataset is the need to predict ratings for users having rated very sparsely too as the test set needs the prediction for all users to be predicted comprehensively. The dataset that we have used is available here.

Fig 5: Distribution of Movie Ratings in our MovieLens Dataset.

5. Results and Comparisons

We compare our results on the MovieLens dataset with those obtained using the FastAI library for Collaborative Filtering as well as Collaborative Filtering with Neural Networks.

CollabDataBunch was used for getting input rating matrix and then collabLearner class was used with a learning rate of 2e-3. The library function gave a mean absolute error of 0.8430.

We also find out the mean absolute error obtained from the FastAI library function employing Neural Networks for Collaborative Filtering.

With factor dimension and embedding size 10, further layers of size 256, 128, and 64 sizes were used which gave a mean absolute error of 0.8241.

Fig 6: Results and Comparisons of Mean Absolute Error (lower is better)

6. Possible Future Extensions

Approach 2 has some issues that arise with any multi-class classification. Given that we have used ReLu in the final output layer, the output from the model is continuous. However, multi-class classification has been shown to work better in many cases using one vs all classification techniques, i.e. to use one model with sigmoid in the final output layer for each possible rating, and then to choose the rating with maximum confidence. Another problem arises due to class imbalance, which may be dealt with by creating a weighted loss function so that the model doesn’t ignore the classes in the process of minimizing the objective function.

7. References

[1] https://github.com/bhawnapaliwal/Collaborative-Filtering-with-Neural-Network Bhawna, Abhineet Pandey, Dr. Shashi Shekhar Jha

[2] Neural Collaborative Filtering Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, Tat-Seng Chua

[3] Large-Scale Parallel Collaborative Filtering for the Netflix Prize Yunhong Zhou, Dennis Wilkinson, Robert Schreiber, Rong Pan

[4] Linden, G., Smith, B., York, J.: Amazon.com recommendations: Item-to-item collaborative filtering. IEEE Internet Computing 7, 76–80 (2003)

[5] Das, A., Datar, M., Garg, A., Rajaram, S.: Google news personalization: Scalable online collaborative filtering. In: Proc. of WWW 2007, pp. 271–280 (2007)

--

--

Bhawna Paliwal

Research Engineer at Microsoft Research, CSE graduate from IIT Ropar. 👩‍💻