Recommendation System: 1 Project That Every Data Science Enthusiast Should Know and Must Try To Implement

SALONI
Analytics Vidhya
Published in
7 min readJul 1, 2021

By: Saloni and Ritesh

Salonitiwary1519@gmail.com

Ritesh.ms.tiwari@gmail.com

As everyone knows, data science technology is a versatile domain that uses different techniques to extract knowledge and insights from data. The emergence of Data Science has paved the way for Computer Science and Engineering across multiple fields due to which there is a huge job opportunity in this domain. And because of it, many people are now wanted to have a professional career in this domain. So, in this article, we have discussed a Recommendation System project in a simplified manner that has a great influence in the data science domain and we tried our best to impart proper knowledge about this project to every passionate folk as well. Here, we are covering the following topics regarding the Recommender System project :

  • Introduction
  • Need
  • Application
  • Challenges
  • Implementation
  • Conclusion

Keywords: Recommendation System, Collaborative Filtering, Content-Based Filtering

Introduction

In general, recommendation means a suggestion given to another person that something is good for a particular purpose. So, Recommendation System is nothing but a system that suggests different products and services to the users based on analysis of historical data. In more technical terms it can be said that a Recommendation system is a subclass of Information Filtering systems that recommend an item or a product based on the ‘rating’ or ‘preferences’ given by the users for an item or a product in the past. Some examples of recommendation systems include Netflix, Amazon, YouTube, etc.

The most common types of recommendation systems which are widely used are :

  • Content-Based Filtering
  • Collaborative Filtering
  • Hybrid Recommendation Systems
Figure: Types of Recommendation System
  1. Content-Based Filtering: In content-based filtering, the recommendation of a product to the user is based on the similarity measures of various properties of other products which have been opted for purchasing in the past.
  2. Collaborative Filtering: In collaborative filtering, the recommendation of a product to the user is based on the similarity measures of like-minded people or items. It is sub-divided into Neighborhood-based approach, Model-based approach, and Hybrid models
  3. Hybrid Recommendation Systems: The user preferences are dynamic in nature. Single content-based or collaborative filtering is unable to provide the recommendation to the users about the products with great accuracy. So, in Hybrid Recommendation System, the recommendation of a product to the user is based on the combination of Content-Based Filtering and Collaborative Filtering.

Need Of Recommendation System :

One of the greatest inventions of the 20th century is the Internet. As we know, the Internet is an ocean of resources from where it is possible to gain knowledge about many things with just a click. However, this ocean consists of a vast pool of information for both the user and the service provider. When users search for certain things intending to gain the proper knowledge about them there might be chances that they may get confused or will select the wrong option. This not only wastes the time and energy of the users but also annoyed them. Because of which there might be a possibility that a user may leave or procrastinate his or her work. Isn’t it? In such situations, there is a huge demand for a system that will help a user in their decision-making or a system that gives them great advice so that they can easily and quickly sort out their choices. Moreover, such a system will also be beneficial for the service providers as it will help them in deciding what kind of offering should they made to gain the user’s attention towards their products.

Application Of Recommendation System :

There are various fields where the recommender system is applicable some of them are stated below:

  • E-Commerce: It is used in E-Commerce sites to recommend products to its user.
  • Media: It is used in electronic media to recommend the latest news and updates.
  • Banking: It is used in banking sectors to suggest the latest offers and benefits to its user.
  • Telecom: It is used in telecommunication sectors to offer the best services to its user.
  • Movies: It is used to recommend movies as per its user choices.
  • Music: It is used to recommend songs to its user based on their previous choices.
  • Books: It is used to recommend books to its end-users based on the genre they love to read.
  • Tourism scenic spots: It is used in tourism sites to offers the most prominent and adequate travel services to the users.

Challenges of Recommendation System :

Like every other project Recommender system also have several challenges which include:

  1. Lack of data: One of the main challenges in recommendation system projects is the need for a sufficient number of data. Like every data science project in a recommender system also data plays a vital role. Making an accurate recommendation to users requires a huge amount of data.
  2. Changing Data: It is often seen that in many areas such as fashion, trends are changing rapidly. So, in such areas, keeping the users' past behavior as a tool for recommending new products would be a bad idea. It is usually observed that whenever past behavior has been used for suggesting a product to the user, the recommender system becomes biased toward old and difficult to advise new products.
  3. Changing user preferences: As the online platform is dynamic in nature, similarly user’s wishes are dynamic too. Today they want one product and another day they want a different product. One day they want to buy a book of some author another day they want to purchase a book of some other author with a different genre. Because of this quick change of preferences by the user, it might become challenging for the recommender system to cope with the user’s choices easily.
  4. Privacy of user data: As mentioned earlier, a good recommendation system requires lots of data to perform efficiently. But there might be chances that the attacker or the third party may misuse the information shared by the users about their choices for some other benefits. Perhaps a recommender system with the best accuracy does not always guarantee the security of users’ data.

Implementation:

Let us elaborate our discussion further with the implementation part in which we have implemented our model suggesting movies to the end-users based on their similarity preferences.

The data set used in the implementation part can be downloaded from: https://media.geeksforgeeks.org/wp-content/uploads/Movie_Id_Titles.csv

# import pandas library
import pandas as pd

# Get the data
column_names = [‘user_id’, ‘item_id’, ‘rating’, ‘timestamp’]

path = ‘https://media.geeksforgeeks.org/wp-content/uploads/file.tsv'

df = pd.read_csv(path, sep=’\t’, names=column_names)

# Check the head of the data
df.head()

# Check out all the movies and their respective IDs
movie_titles = pd.read_csv(‘https://media.geeksforgeeks.org/wp-content/uploads/Movie_Id_Titles.csv')
movie_titles.head()

# Calculate mean rating of all movies
data.groupby(‘title’)[‘rating’].mean().sort_values(ascending=False).head()

# Calculate count rating of all movies
data.groupby(‘title’)[‘rating’].count().sort_values(ascending=False).head()

# creating dataframe with ‘rating’ count values
ratings = pd.DataFrame(data.groupby(‘title’)[‘rating’].mean())

ratings[‘num of ratings’] = pd.DataFrame(data.groupby(‘title’)[‘rating’].count())

ratings.head()

import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style(‘white’)
%matplotlib inline

# plot graph of ‘num of ratings column’
plt.figure(figsize =(10, 4))

ratings[‘num of ratings’].hist(bins = 70)

# plot graph of ‘ratings’ column
plt.figure(figsize =(10, 4))

ratings[‘rating’].hist(bins = 70)

# Sorting values according to
# the ‘num of rating column’
moviemat = data.pivot_table(index =’user_id’,
columns =’title’, values =’rating’)

moviemat.head()

ratings.sort_values(‘num of ratings’, ascending = False).head(10)

# analysing correlation with similar movies
starwars_user_ratings = moviemat[‘Star Wars (1977)’]
liarliar_user_ratings = moviemat[‘Liar Liar (1997)’]

starwars_user_ratings.head()

# analysing correlation with similar movies
similar_to_starwars = moviemat.corrwith(starwars_user_ratings)
similar_to_liarliar = moviemat.corrwith(liarliar_user_ratings)

corr_starwars = pd.DataFrame(similar_to_starwars, columns =[‘Correlation’])
corr_starwars.dropna(inplace = True)

corr_starwars.head()

# Similar movies like starwars
corr_starwars.sort_values(‘Correlation’, ascending = False).head(10)
corr_starwars = corr_starwars.join(ratings[‘num of ratings’])

corr_starwars.head()

corr_starwars[corr_starwars[‘num of ratings’]>100].sort_values(‘Correlation’, ascending = False).head()

# Similar movies as of liarliar
corr_liarliar = pd.DataFrame(similar_to_liarliar, columns =[‘Correlation’])
corr_liarliar.dropna(inplace = True)

corr_liarliar = corr_liarliar.join(ratings[‘num of ratings’])
corr_liarliar[corr_liarliar[‘num of ratings’]>100].sort_values(‘Correlation’, ascending = False).head()

In this article, we have discussed the recommendation system project and briefly explain its needs, applications, and challenges. Moreover, we have also implemented a simple recommendation model using python programming language on jupyter notebook which suggests items that are more similar to a particular item(in this case movie has been considered as one item and user choices as another item) by using Correlation between these two items.

References:

For any further queries, you can contact us:

Saloni and Ritesh

Salonitiwary1519@gmail.com

ritesh.ms.tiwari@gmail.com

--

--

SALONI
Analytics Vidhya

Data Science | Life Philosophy | Sports | Food | Travel|