Recommendation System for Novice- with a sample dataset.
Hi…! This blog explains the basic statistical concepts behind the implementation of recommendation system with a simple use case.
Before we deep dive into the topic, let us understand what is a recommendation system..? and its impact in Entertainment and E-commerce domains.
What is a Recommendation System..?
A Recommendation system is basically a information filtering system which predicts the rating/feedback that the user might give to the product. The recommender system uses several statistical and ML algorithms like clustering and ensemble methods to predict the rating/feedback of the product by the specific user based on his past feedback on different products.
Significance of Recommendation System in E-Commerce
- According to a paper written by Netflix executives Carlos A. Gomez-Uribe and Neil Hunt, the video streaming service’s AI recommendation system saves the company around $1 billion each year.
- Netflix even conducted a competition named “Netflix Prize” for creating the best Collaborative filtering algorithms in 2006 with the grand prize of $1,000,000$.
- The task was to improvise its existing algorithm “Cinematch”. For further details refer this link (CineMatch)
- The winner’s model (BellKor’s Pragmatic Chaos team) was able to increase its accuracy over 10%.
- Several other E-commerce sites such as Amazon posses its own customized recommendation system.
Types of Recommendation System
There are 3 different types of recommendation system.
- Content-Based Recommendation System
- Collaborative Filtering/Recommendation System
- Hybrid Recommendation System
Content-Based Recommendation System
In this type of recommendation types, the user’s profile and the item’s profile are taken in to account for predicting the ratings of the user. The examples in case of a literature (Books ) are Genre, Language, author’s profile,keywords and the user’s choice in the past etc…,
Collaborative Filtering recommendation system
Unlike content based system,Collaborative filtering system doesn’t consider the User attributes ( age category, gender, profile etc..,) and product attributes( Genre,language,Keywords etc..,).It identifies the relationship between the users and product from the User-Item matrix.
There are two types of Collaborative filtering systems.
- Model Based Collaborative System
- Memory Based Collaborative System
In Model based system, ML techniques like Clustering,and neural nets are employed to predict the ratings. Whereas In Memory based, it is further classified as
- User -Based recommendation System
- Item -Based recommendation System
User Based Filtering
In user based filtering, the similarity among the user is identified based on their profile history (past ratings and subscription) using Centered Cosine Similarities,Pearson Correlation . If user A and B are similar, then the products which B subscribed is recommended to A and vice-versa.
Pearson Correlation
Pearson Correlation is used to find the relationship between two entities. The value of Pearson Correlation ranges from -1 to +1. The Positive values implies that, there is a positive correlation among the entities. Negative values implies that there is a negative correlation among the entities. The value closer to 0 implies there is no correlation between them. The similarity correlation among item ‘X’ and ‘Y’ is calculated by r
The relationship table for the different ranges of correlation values is given by
Cosine Similarity
The similarity among the users can also be calculated by cosine similarity function.The similarity is calculated by the angle between the user vectors.It is assumed that, the users are similar when they have small angles between them. The formula for cosine similarity is given by
The Graphical vector form is shown below.
Prediction Of User’s Ratings
The Prediction of ratings given by the user is calculated by the Bayesian Weighted Average. The prediction of the User ‘i’ on a product “P” is calculated as
where S(i,j) — similarity among user “i” and “j”
R (j) — Rating given by user “j” on the product.
Item Based Filtering
Item based Collaborative system was introduced by Amazon in 1998.In this type of filtering , the similarity among the products is calculated instead of the users. The former is achieved by calculating how many users bought product “p” also bought product “q”.
The Similar items are recommended based on the user’s past feedback (or) ratings.
Here, the products “A” and “C” are similar. since the user ‘A’ liked the product ‘A’ , the product “C” is recommended to user ‘A’.
Collaborative vs Content Based Recommendation System
Collaborative recommendation systems requires only the user-product ratings, whereas the Content Based systems requires metadata of the product and the user info. Hence Collaborative is widely used in most E-Commerce Sites.
Pros of Collaborative Recommendation
- Domain Knowledge is not required. Since it deals only with the ratings of the user on products, it can be employed in every domain.
- Serendipity- The model can help users discover new interests. In isolation, the ML system may not know the user is interested in a given item, but the model might still recommend it because similar users are interested in that item.
Cons of Collaborative Recommendation
- Cold Start Problem- Since the collaborative works on the past activities of the user and products, it cannot be used in-case of New User or New Product.
- Data Sparsity problem- It is the condition where, NULL occupies more than 90% of the user-product matrix.It leads to positive correlation errors in Pearson correlation and negative errors in cosine similarities.
Simple Collaborative Recommendation System for Books -Item based
Let’s consider a subset of book-crossing dataset. This dataset consists of 15 books of different genre which was rated by several users. The dataset is selected in such a way that it is Data dense and all the books were evenly read and rated. The dataset shown below
The User-Item matrix is formed by merging and pivoting the ratings and books dataset ( pandas.merge and pandas.pivot_table).
The Item-based similarity is calculated using Pearson correlation. The correlation matrix among the books is shown below
From the correlation values (similarity values), the similar items are identified and are recommended to the users. The Identified item pairs are
Note
The above dataset is selected in such a way that, it has low Data Sparsity and has equally rated books ( free from Cold Start problems) for basic understanding. However In real world , the things would be more complicated in terms of identifying similarity as well as accuracy of prediction.
Hope, we are now clear with the statistical concepts behind the implementation of the recommendation system.
CHEERS……!