How to Run Recommender Systems in Python

A practical example of Movies Recommendation with Recommender Systems

George Pipis
Sep 12 · 6 min read
Image for post
Image for post
Photo by Pankaj Patel on Unsplash

A Brief Introduction to Recommender Systems

Surprise for Recommender Systems

Image for post
Image for post
Screenshot from Surprise Documentation

Build your own Recommender System

import pandas as pd
import numpy as np

columns = ['user_id', 'item_id', 'rating', 'timestamp']
df = pd.read_csv('ml-100k/u.data', sep='\t', names=columns)

columns = ['item_id', 'movie title', 'release date', 'video release date', 'IMDb URL', 'unknown', 'Action', 'Adventure',
'Animation', 'Childrens', 'Comedy', 'Crime', 'Documentary', 'Drama', 'Fantasy', 'Film-Noir', 'Horror',
'Musical', 'Mystery', 'Romance', 'Sci-Fi', 'Thriller', 'War', 'Western']

movies = pd.read_csv('ml-100k/u.item', sep='|', names=columns, encoding='latin-1')
movie_names = movies[['item_id', 'movie title']]

combined_movies_data = pd.merge(df, movie_names, on='item_id')
combined_movies_data = combined_movies_data[['user_id','movie title', 'rating']]
combined_movies_data.head()
Image for post
Image for post
# my user_id is the 1001
my_ratings = pd.read_csv('my_movies_rating.csv')
my_ratings
Image for post
Image for post
combined_movies_data = pd.concat([combined_movies_data, my_ratings], axis=0)

# rename the columns to userID, itemID and rating
combined_movies_data.columns = ['userID', 'itemID', 'rating']

# use the transform method group by userID and count
# to keep the movies with more than 25 reviews

combined_movies_data['reviews'] = combined_movies_data.groupby(['itemID'])['rating'].transform('count')

combined_movies_data= combined_movies_data[combined_movies_data.reviews>25][['userID', 'itemID', 'rating']]
from surprise import NMF, SVD, SVDpp, KNNBasic, KNNWithMeans, KNNWithZScore, CoClustering
from surprise.model_selection import cross_validate
from surprise import Reader, Dataset
# A reader is still needed but only the rating_scale param is requiered.
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(combined_movies_data, reader)
# get the list of the movie ids
unique_ids = combined_movies_data['itemID'].unique()

# get the list of the ids that the userid 1001 has rated
iids1001 = combined_movies_data.loc[combined_movies_data['userID']==1001, 'itemID']

# remove the rated movies for the recommendations
movies_to_predict = np.setdiff1d(unique_ids,iids1001)

Recommender Systems using NMF

algo = NMF()
algo.fit(data.build_full_trainset())

my_recs = []
for iid in movies_to_predict:
my_recs.append((iid, algo.predict(uid=1001,iid=iid).est))

pd.DataFrame(my_recs, columns=['iid', 'predictions']).sort_values('predictions', ascending=False).head(10)
Image for post
Image for post

Recommender Systems using SVD

algo = SVD()
algo.fit(data.build_full_trainset())

my_recs = []
for iid in movies_to_predict:
my_recs.append((iid, algo.predict(uid=1001,iid=iid).est))

pd.DataFrame(my_recs, columns=['iid', 'predictions']).sort_values('predictions', ascending=False).head(10)
Image for post
Image for post

Recommender Systems using SVD++

algo = SVDpp()
algo.fit(data.build_full_trainset())

my_recs = []
for iid in movies_to_predict:
my_recs.append((iid, algo.predict(uid=1001,iid=iid).est))

pd.DataFrame(my_recs, columns=['iid', 'predictions']).sort_values('predictions', ascending=False).head(10)
Image for post
Image for post
algo = KNNWithZScore()
algo.fit(data.build_full_trainset())

my_recs = []
for iid in movies_to_predict:
my_recs.append((iid, algo.predict(uid=1001,iid=iid).est))

pd.DataFrame(my_recs, columns=['iid', 'predictions']).sort_values('predictions', ascending=False).head(10)
Image for post
Image for post

Recommender Systems using Co-Clustering

algo = CoClustering()
algo.fit(data.build_full_trainset())

my_recs = []
for iid in movies_to_predict:
my_recs.append((iid, algo.predict(uid=1001,iid=iid).est))

pd.DataFrame(my_recs, columns=['iid', 'predictions']).sort_values('predictions', ascending=False).head(10)
Image for post
Image for post

How to Evaluate the Recommender Systems

cv = []
# Iterate over all recommender system algorithms
for recsys in [NMF(), SVD(), SVDpp(), KNNWithZScore(), CoClustering()]:
# Perform cross validation
tmp = cross_validate(recsys, data, measures=['RMSE'], cv=3, verbose=False)
cv.append((str(recsys).split(' ')[0].split('.')[-1], tmp['test_rmse'].mean()))

pd.DataFrame(cv, columns=['RecSys', 'RMSE'])
Image for post
Image for post

Discussion

The Startup

Medium's largest active publication, followed by +732K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store