Recommendation System for Novice- with a sample dataset.

Published in

Analytics Vidhya

6 min readMay 31, 2020

Hi…! This blog explains the basic statistical concepts behind the implementation of recommendation system with a simple use case.

Before we deep dive into the topic, let us understand what is a recommendation system..? and its impact in Entertainment and E-commerce domains.

What is a Recommendation System..?

A Recommendation system is basically a information filtering system which predicts the rating/feedback that the user might give to the product. The recommender system uses several statistical and ML algorithms like clustering and ensemble methods to predict the rating/feedback of the product by the specific user based on his past feedback on different products.

**Netflix Recommendation System |** google Images

Significance of Recommendation System in E-Commerce

According to a paper written by Netflix executives Carlos A. Gomez-Uribe and Neil Hunt, the video streaming service’s AI recommendation system saves the company around $1 billion each year.
Netflix even conducted a competition named “Netflix Prize” for creating the best Collaborative filtering algorithms in 2006 with the grand prize of $1,000,000$.
The task was to improvise its existing algorithm “Cinematch”. For further details refer this link (CineMatch)
The winner’s model (BellKor’s Pragmatic Chaos team) was able to increase its accuracy over 10%.
Several other E-commerce sites such as Amazon posses its own customized recommendation system.

Types of Recommendation System

There are 3 different types of recommendation system.

Content-Based Recommendation System
Collaborative Filtering/Recommendation System
Hybrid Recommendation System

**Recommendation System Types -Flowchart**

Content-Based Recommendation System

In this type of recommendation types, the user’s profile and the item’s profile are taken in to account for predicting the ratings of the user. The examples in case of a literature (Books ) are Genre, Language, author’s profile,keywords and the user’s choice in the past etc…,

**Content based movie recommendation system.**

Collaborative Filtering recommendation system

Unlike content based system,Collaborative filtering system doesn’t consider the User attributes ( age category, gender, profile etc..,) and product attributes( Genre,language,Keywords etc..,).It identifies the relationship between the users and product from the User-Item matrix.

There are two types of Collaborative filtering systems.

Model Based Collaborative System
Memory Based Collaborative System

In Model based system, ML techniques like Clustering,and neural nets are employed to predict the ratings. Whereas In Memory based, it is further classified as

User -Based recommendation System
Item -Based recommendation System

User Based Filtering

In user based filtering, the similarity among the user is identified based on their profile history (past ratings and subscription) using Centered Cosine Similarities,Pearson Correlation . If user A and B are similar, then the products which B subscribed is recommended to A and vice-versa.

**User-Based Collaborative Filtering System**

Pearson Correlation

Pearson Correlation is used to find the relationship between two entities. The value of Pearson Correlation ranges from -1 to +1. The Positive values implies that, there is a positive correlation among the entities. Negative values implies that there is a negative correlation among the entities. The value closer to 0 implies there is no correlation between them. The similarity correlation among item ‘X’ and ‘Y’ is calculated by r

The relationship table for the different ranges of correlation values is given by

Cosine Similarity

The similarity among the users can also be calculated by cosine similarity function.The similarity is calculated by the angle between the user vectors.It is assumed that, the users are similar when they have small angles between them. The formula for cosine similarity is given by

The Graphical vector form is shown below.

Prediction Of User’s Ratings

The Prediction of ratings given by the user is calculated by the Bayesian Weighted Average. The prediction of the User ‘i’ on a product “P” is calculated as

where S(i,j) — similarity among user “i” and “j”

R (j) — Rating given by user “j” on the product.

Item Based Filtering

Item based Collaborative system was introduced by Amazon in 1998.In this type of filtering , the similarity among the products is calculated instead of the users. The former is achieved by calculating how many users bought product “p” also bought product “q”.

The Similar items are recommended based on the user’s past feedback (or) ratings.

**Product A and C are similar.hence it is recommended to User ‘A’**

Here, the products “A” and “C” are similar. since the user ‘A’ liked the product ‘A’ , the product “C” is recommended to user ‘A’.

Collaborative vs Content Based Recommendation System

Collaborative recommendation systems requires only the user-product ratings, whereas the Content Based systems requires metadata of the product and the user info. Hence Collaborative is widely used in most E-Commerce Sites.

Pros of Collaborative Recommendation

Domain Knowledge is not required. Since it deals only with the ratings of the user on products, it can be employed in every domain.
Serendipity- The model can help users discover new interests. In isolation, the ML system may not know the user is interested in a given item, but the model might still recommend it because similar users are interested in that item.

Cons of Collaborative Recommendation

Cold Start Problem- Since the collaborative works on the past activities of the user and products, it cannot be used in-case of New User or New Product.
Data Sparsity problem- It is the condition where, NULL occupies more than 90% of the user-product matrix.It leads to positive correlation errors in Pearson correlation and negative errors in cosine similarities.

Simple Collaborative Recommendation System for Books -Item based

Let’s consider a subset of book-crossing dataset. This dataset consists of 15 books of different genre which was rated by several users. The dataset is selected in such a way that it is Data dense and all the books were evenly read and rated. The dataset shown below

**Rating dataset which is of shape (101,3)**

The User-Item matrix is formed by merging and pivoting the ratings and books dataset ( pandas.merge and pandas.pivot_table).

The Item-based similarity is calculated using Pearson correlation. The correlation matrix among the books is shown below

**Correlation values among the different books**

From the correlation values (similarity values), the similar items are identified and are recommended to the users. The Identified item pairs are

Note

The above dataset is selected in such a way that, it has low Data Sparsity and has equally rated books ( free from Cold Start problems) for basic understanding. However In real world , the things would be more complicated in terms of identifying similarity as well as accuracy of prediction.

Hope, we are now clear with the statistical concepts behind the implementation of the recommendation system.

CHEERS……!