Day 48 of 100DaysofML

Published in

100DaysofMLcode

5 min readAug 3, 2020

Collaborative Based Filtering. I had mentioned about Recommendation engines in my previous blogs so I thought I’d do a little bit of one of its variants which is Collaborative Based Filtering. For the ones who are reading my blog for the first time, I’ll try to cover the essentials.

Recommender systems aim to predict users’ interests and recommend product items that quite likely are interesting for them. They are among the most powerful machine learning systems that online retailers implement in order to drive sales. Data required for recommender systems stems from explicit user ratings after watching a movie or listening to a song, from implicit search engine queries and purchase histories, or from other knowledge about the users/items themselves. Sites like Spotify, YouTube or Netflix use that data in order to suggest playlists, so-called Daily mixes, or to make video recommendations, respectively.

Working of the Collaborative based Recommender system (Algorithm)

Collaborative methods for recommender systems are methods that are based solely on the past interactions recorded between users and items in order to produce new recommendations. These interactions are stored in the so-called “user-item interactions matrix”. Then, the main idea that rules collaborative methods is that these past user-item interactions are sufficient to detect similar users and/or similar items and make predictions based on these estimated proximities.

The class of collaborative filtering algorithms is divided into two sub-categories that are generally called memory based and model based approaches. Memory based approaches directly works with values of recorded interactions, assuming no model, and are essentially based on nearest neighbours search (for example, find the closest users from a user of interest and suggest the most popular items among these neighbours). Model based approaches assume an underlying “generative” model that explains the user-item interactions and try to discover it in order to make new predictions. Moreover, the more users interact with items the more new recommendations become accurate: for a fixed set of users and items, new interactions recorded over time bring new information and make the system more and more effective. However, as it only consider past interactions to make recommendations, collaborative filtering suffer from the “cold start problem”: it is impossible to recommend anything to new users or to recommend a new item to any users and many users or items have too few interactions to be efficiently handled.

Lets dive right into the implementation.

Lets start by importing the libraries required.

#Importing libraries which are needed
from surprise import Reader, Dataset, SVD
import pandas as pd#Reader is usually used to parse a file containing ratings
reader = Reader()#importing the cross-validate from surprise
from surprise.model_selection import cross_validate

Next step is to import the dataset.

The link to the dataset is given below.

The Movies Dataset

Metadata on over 45,000 movies. 26 million ratings from over 270,000 users.

www.kaggle.com

I’d suggest to download the smaller ratings file if you don't have time and space constraints.

#Importing dataset
ratings = pd.read_csv('ratings.csv')
ratings.head()

Next step would be to load the data in pandas.

#loading the data from dataframe created in pandas
data = Dataset.load_from_df(ratings[['userId', 'movieId', 'rating']], reader)

We are making use of SVD from surprise library.

#We build the SVD model by directly making use of SVD from surprise
algorithm=SVD()

For the given dataset, we have to evaluate parameter such as Root mean square error and MAE along with fit time and test time.

cross_validate(algorithm, data, measures=[‘RMSE’, ‘MAE’], cv=5, verbose=True)

This table mainly refers to the Root Mean Square Error to evaluate how our model is doing.

We now build our entire model.

#The RMSE values obtained are good
#Since the values are good, we move on to training of the modeltrained_model=data.build_full_trainset()

The next step is to fit the data to the model.

#Fitting the data to the model
algorithm.fit(trained_model)

Based on the user that we select, we can identify the recommended movies to that specific user id which is the entire motive of our recommendation engine in the first place.

I have chosen a random userID=2.

#Let us choose userid=2 to start with
#finding the recommended movies according to userID
ratings[ratings['userId'] == 2]

**Output of recommended movies for USER 2.**

Another important application which can be tried out is to predict the probability of a movie getting watched. This may be any random movie from our dataset. Keep in mind, I have taken random values below.

algorithm.predict(1,123,2)

For the movie with given ID=123 , the probability that it will get watched is 4.56 . This means that 1 in every 4 people in the given dataset would watch the given movie. This is the basis of Collaborative based Recommendation system. in the above graph, it was noticed that for each user, ther movies that he/ she would watch are mentioned in a tabular form. That tends to Content based Recommendation system. In the output predicted at the end, we also notice that the was_impossible says False because the ID is in range and it is watched by a user.

I have attached a video below which covers the absolute basics of the concept and just in case you wanna get a rough idea of the concept.

That's the overall of creating a Movie Recommendation engine from scratch.

That’s it for today. Thanks for reading. Keep Learning.

Cheers.

Day 48 of 100DaysofML

The Movies Dataset

Metadata on over 45,000 movies. 26 million ratings from over 270,000 users.

Written by Charan Soneji