INSAID
Published in

INSAID

Enter the world of personalized experiences with Recommender Systems

In this article, we are going to learn how to build a hybrid recommendation system

INTRODUCTION

Source: analyticsvidhya

Recommender systems are particularly useful when an individual needs to choose an item from a potentially overwhelming number of items that a service may offer.

TYPES OF RECOMMENDER SYSTEMS

  1. Collaborative filtering Recommender Systems

NEED FOR A HYBRID RECOMMENDER SYSTEM

Cons of Content-based recommender system?

  1. Difficulty in understanding user preferences: Content-based recommendation systems require a lot of input from the user in order to make accurate recommendations. This can be difficult for users who are unfamiliar with the product or service, as they may not know how to properly express their preferences.
  2. Difficulty in capturing user context: Content-based systems can be limited in their ability to capture user context. They are unable to take into account user behavior or mood, which can be important factors in predicting user preferences.
  3. Over-personalization: Content-based recommendation systems can result in a “filter bubble” effect, where the same recommendations are served to users regardless of their context or preferences. This can lead to a lack of variety and limit users’ exposure to new content.

Cons of Collaborative recommender system

  1. Dependency on User Input: Collaborative recommendation systems rely heavily on user input, which can be unreliable and inaccurate. If a user provides incorrect information, it can lead to inaccurate recommendations.
  2. Privacy Concerns: Another potential downside of collaborative recommendation systems is the potential privacy concerns. Depending on what type of information is being collected, users may be concerned about how their data is being used.
  3. The difficulty of Implementing: Implementing a collaborative recommender system can be complex and time-consuming. It requires a lot of data processing and analysis to create accurate recommendations.

FILTERING TO BUILD A ROBUST RECOMMENDER SYSTEM

Filtering by AvgRating

  • Filtering data based on average ratings help to identify which items are most popular and/or highly rated. This allows a recommender system to better personalize its recommendations to a user by suggesting items that are likely to be of higher quality or more likely to be enjoyed by the user.
  • By filtering data based on the average rating, a recommender system can ensure that it is only suggesting items to a user that have some level of popularity or quality.

Filtering by Number of Ratings

  • Filtering data based on the number of ratings is important in recommender systems because it helps to ensure that the recommendations given are of high quality.
  • Having a minimum number of ratings for an item helps to ensure that the item is being recommended based on real user feedback, and not just on the basis of some random factors.
  • This also helps to reduce the potential for bias in the system, as items with fewer ratings may be more likely to be recommended on the basis of their own attributes, rather than on the basis of user feedback.

BUILDING A HYBRID RECOMMENDER SYSTEM

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
from urllib.request import urlretrieve
import zipfile

urlretrieve("http://files.grouplens.org/datasets/movielens/ml-100k.zip", "movielens.zip")
zip_ref = zipfile.ZipFile('movielens.zip', "r")
zip_ref.extractall()

# Load each data set (users, movies, and ratings).
users_cols = ['user_id', 'age', 'sex', 'occupation', 'zip_code']
users = pd.read_csv('ml-100k/u.user', sep='|', names=users_cols, encoding='latin-1')

ratings_cols = ['user_id', 'movie_id', 'rating', 'unix_timestamp']
ratings = pd.read_csv('ml-100k/u.data', sep='\t',
names=ratings_cols, encoding='latin-1')

# The movies file contains a binary feature for each genre.
genre_cols = ["genre_unknown", "Action", "Adventure", "Animation", "Children", "Comedy",
"Crime", "Documentary", "Drama", "Fantasy", "Film-Noir", "Horror",
"Musical", "Mystery", "Romance", "Sci-Fi", "Thriller", "War", "Western"]

movies_cols = ['movie_id', 'title', 'release_date', "video_release_date", "imdb_url"] + genre_cols
movies = pd.read_csv( 'ml-100k/u.item', sep='|', names=movies_cols, encoding='latin-1')

# Since the ids start at 1, we shift them to start at 0.
users["user_id"] = users["user_id"].apply(lambda x: str(x-1))
movies["movie_id"] = movies["movie_id"].apply(lambda x: str(x-1))
movies["year"] = movies['release_date'].apply(lambda x: str(x).split('-')[-1])
ratings["movie_id"] = ratings["movie_id"].apply(lambda x: str(x-1))
ratings["user_id"] = ratings["user_id"].apply(lambda x: str(x-1))
ratings["rating"] = ratings["rating"].apply(lambda x: float(x))

# Compute the number of movies to which a genre is assigned.
genre_occurences = movies[genre_cols].sum().to_dict()

# Since some movies can belong to more than one genre, we create different
# 'genre' columns as follows:
# - all_genres: all the active genres of the movie.
# - genre: randomly sampled from the active genres.

def mark_genres(movies, genres):
def get_random_genre(gs):
active = [genre for genre, g in zip(genres, gs) if g==1]
if len(active) == 0:
return 'Other'
return np.random.choice(active)

def get_all_genres(gs):
active = [genre for genre, g in zip(genres, gs) if g==1]
if len(active) == 0:
return 'Other'
return '-'.join(active)
movies['genre'] = [
get_random_genre(gs) for gs in zip(*[movies[genre] for genre in genres])]
movies['all_genres'] = [
get_all_genres(gs) for gs in zip(*[movies[genre] for genre in genres])]

mark_genres(movies, genre_cols)

# Create one merged DataFrame containing allaa the movielens data.
movielens = ratings.merge(movies, on='movie_id').merge(users, on='user_id')
def content_recommendation(title):
"""
Returns a list of content recommendations based on the provided title.
The recommendations are determined by calculating the cosine similarity between the genre of the provided title and
the genres of other content in the dataframe, df. The top 100 most similar content are selected, and duplicates are
removed to return a list of at most 10 content recommendations.
"""
# Initialize TfidfVectorizer and fit it to the genres in the dataframe
vectorizer = TfidfVectorizer(ngram_range=(1,2))
tfidf = vectorizer.fit_transform(df["genre"])
# Transform the provided title into a vector using the vectorizer
query_vec = vectorizer.transform([title])
# Calculate the cosine similarity between the title vector and the genre vectors
similarity = cosine_similarity(query_vec, tfidf).flatten()
# Select the indices of the top 100 most similar content
indices = np.argpartition(similarity, -10)[-100:]
# Select the rows of the dataframe corresponding to the selected indices, and sort them in descending order of similarity
results = df.iloc[indices].iloc[::-1]
# Remove duplicates based on the 'title' column
results = results.drop_duplicates(subset=['title'])
# Fetching only the title values of the movie and converting it to a list
content_reco = results.title.values.tolist()
# Return the list of the titles of the content based recommended content
return content_reco

def collaborative_recommendation(title):
"""
This functions leverages the collaborative filtering approach for recommending movies based on the ratings given to them
by the users. An index of the array is taken out first and the index is given for the similarity score calculation and
top 5 results are returned.
"""

#Fetching Index of the movie
try:
index = np.where([pivot.columns==title])[0][0]
except:
pass

#Finding the similar movies using the similarity score and fetching top-n results
similar_items = sorted(list(enumerate(similarity_scores[index])),key=lambda x:x[1],reverse=True)[1:5]

#initiating a list
data = []
#Initiating a loop to loop through the similar items
for i in similar_items:
item = []
#Creating a temporary dataframe where the title in the database and the search keyword is the same.
temp_df = df[df['title'] == pivot.columns[i[0]]]
#Adding the item to the list after dropping duplicates based on the title column in the temporary dataframe
item.extend(list(temp_df.drop_duplicates('title')['title'].values))
#Adding the item object to the data list (for all the similar items)
data.append(item)
#return the list of collaborative filtering recommended movies
return data

def hybrid_recommendation(title):
"""
This function utilizes the capabilities of the earlier functions to provide a single set of unique recommendations
utilizing content and collaborative filtering (hybrid recommendations). The output is a list of those recommendations.
"""
#get the list of recommended movies from the content-based system
content_recommended_movies = np.unique(content_recommendation(title)).tolist()

#get the list of recommended movies from the collaborative filtering system
collaborative_recommended_movies = np.unique(collaborative_recommendation(title)).tolist()

#combine the two lists
recommended_movies = (content_recommended_movies + collaborative_recommended_movies)

#return the combined list of recommended movies
return recommended_movies

CONCLUSION

  1. A recommender system, or a recommendation system (sometimes replacing ‘system’ with a synonym such as a platform or an engine), is a subclass of information filtering systems that provide suggestions for items that are most relevant to a particular user.
  2. There are various types of recommender systems such as Popularity Based Recommender Systems, Content-Based Recommender Systems, Collaborative Filtering Recommender Systems, and Hybrid Recommender Systems
  3. There are several challenges with Content-Based and Collaborative Filtering Recommender Systems which gives rise to the need for Hybrid recommender systems
  4. Filtering by average ratings given to a product and the number of ratings given to a product is a good idea to ensure that the quality recommendations are only being made to the users.

Final Thoughts and Closing Comments

--

--

INSAID is India’s leading powerhouse in Data Science & AI research and education. INSAID provides world-class programs and certifications to working professionals across 300+ companies https://www.insaid.co/.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
INSAID

One of India’s leading institutions providing world-class Data Science & AI programs for working professionals with a mission to groom Data leaders of tomorrow!