Recommend Amazon Movie — A Collaborative Approach

Meimi Li
Analytics Vidhya
Published in
4 min readAug 23, 2020

Item-based collaborative and User-based collaborative approach for recommendation system with simple coding.

http://wildelori.blogspot.com/2011/07/seen-any-good-movies-lately.html

Overview of Recommendation System

There are many methods of recommendation system where each of them serve for different purposes. My previous article is talking about the simple and content-based recommendation. These recommendations are non-personalised recommenders, but that doesn’t mean they are less useful when compare to the other. These method are very popular for recommending top music of the week and recommending music of similar genre.

In this article, it will focus on collaborative filtering method. This method considers your taste in comparison to people/items that are in similar. Then, it recommends a list of items based on consumption similarity and suggest what you probably interested. These method only focus on calculating the rating.

There are two main filtering for this method: item-based filtering and user-based filtering. Item-based filtering will suggest items that are similar to what you have already liked. User-based filtering will suggest items that people similar to you have liked but you have not yet consumed.

With the Amazon movie data, we will apply item-based filtering and user-based filtering recommendation methods to analyze similar items to be recommend and identify users that have similar taste.

Analysis Overview

For both item-based filtering and user-based filtering recommendation, we need to clean data and prepare them into matrix so that it can be used for analysis. All ratings need to be in numbers and normalized and cosine similarity will be used to calculate items/users similarity.

Data Overview

There are 4,848 users with a total of 206 movies in the dataset.

Implementation

Now, lets import all tools that we are going to use for the analysis, put data into DataFrame, and clean them.

import pandas as pd
import numpy as np
import scipy as sp
from scipy import sparse
from sklearn.metrics.pairwise import cosine_similarity
amazon = pd.read_csv('.../Amazon.csv')
amazon.info()
amazon.head()

Then, we need to rearrange data into matrix format where we will set index for the rows as user_id and index for the column as name.

amazon = amazon.melt(id_vars=['user_id'], var_name='name', value_name='rating')
amazon_pivot = amazon.pivot_table(index=['user_id'], columns=['name'], values='rating')
amazon_pivot.head()

From here, we need to normalized the rating values so that value range are closer to one and another. Then, turn the NaN values into 0 and select only those users who at least rate one movie.

amazon_normalized = amazon_pivot.apply(lambda x: (x-np.min(x))/(np.max(x)-np.min(x)), axis=1)amazon_normalized.fillna(0, inplace=True)
amazon_normalized = amazon_normalized.T
amazon_normalized = amazon_normalized.loc[:, (amazon_normalized !=0).any(axis=0)]

We nearly there. Now we need to put them into sparse matrix.

amazon_sparse = sp.sparse.csr_matrix(amazon_normalized.values)

Lets look at item-based filtering recommendation.

item_similarity = cosine_similarity(amazon_sparse)
item_sim_df = pd.DataFrame(item_similarity, index=amazon_normalized.index, columns=amazon_normalized.index)
item_sim_df.head()

All the columns and rows are now become each of the movie and it is ready for the recommendation calculation.

def top_movie(movie_name):
for item in item_sim_df.sort_values(by = movie_name, ascending = False).index[1:11]:
print('Similar movie:{}'.format(item))
top_movie("Movie102")

These are the movies that are similar to Movie102.

Lets look at user-based filtering recommendation. Who has similar taste to me?

user_similarity = cosine_similarity(amazon_sparse.T)
user_sim_df = pd.DataFrame(user_similarity, index = amazon_normalized.columns, columns = amazon_normalized.columns)
user_sim_df.head()
def top_users(user):  
sim_values = user_sim_df.sort_values(by=user, ascending=False).loc[:,user].tolist()[1:11]
sim_users = user_sim_df.sort_values(by=user, ascending=False).index[1:11]
zipped = zip(sim_users, sim_values)
for user, sim in zipped:
print('User #{0}, Similarity value: {1:.2f}'.format(user, sim))
top_users('A140XH16IKR4B0')

These are the examples on how to implement the item-based and user-based filtering recommendation system. Some of the code are from https://www.kaggle.com/ajmichelutti/collaborative-filtering-on-anime-data

Hope that you enjoy!

--

--

Meimi Li
Analytics Vidhya

A graduate from business intelligence major who fascinate about Machines learning and Artificial Intelligence.