Building a Recommendation System is not a trivial task and it comes with its own set of problems and challenges. This article is an effort to provide readers a deeper insight into building recommendation systems.
Before starting let me thank Frank Kane, one of Amazon’s pioneers in the field for his book Building Recommendation Systems using Machine Learning and AI and his amazing video lectures.
Table of Contents :
- Introduction and Recommendation Framework
- Evaluating Recommendation Systems
- Content Based Recommendations
- Neighborhood Based Collaborative Filtering
- User and Item Based Collaborative Filtering
- KNN Recommendations
- Matrix Factorisation
- Deep Learning - Introduction
- Restricted Boltzmann Machines
- Amazon DSSTNE and Sage Maker
- Real-World Challenges and Solutions
Let’s naively understand what Recommendation Systems are with a simple example — Recommendation Systems are used
to recommend articles, music, people or restaurants to its users.
Recommendation Systems with high performance lead to mutual benefits of users as well as the business organisation implementing it.
As stated by Wikipedia, a recommendation system is a subclass of information filtering system that seeks to predict the “rating” or “preference” a user would give to them. Some examples of Recommendation Systems are recommending articles, music, people and search results.
How does Recommendation Systems Work?
It all starts with you! ( Your Data )
Data about you can be collected in two different ways:
1. EXPLICIT DATA
- It is collected with the help of feedback or a survey or ratings.
- Not everyone bothers to fill forms or take surveys. Moreover, One person’s point of view may vary from another person’s point of view depending on social-economic-political and geographical differences.
2. IMPLICIT DATA
- Understanding users through their ‘click-data’ or ‘stream-data’
- Prone to frauds
- Amazon has so much implicit data that it doesn’t need better algorithms! Even simple algorithms work like charm when we have a huge amount of data.
TAKE AWAY :
Recommendation Systems cannot produce good result until it has good data to work with. LOTS OF IT!! (Like in the Amazon example explained above)
TOP-N RECOMMENDATION SYSTEM
Whenever we are talking about a recommendation system, we are in fact talking about “Top N Recommendation System”. Top N Recommendation System tries to give user ‘N’ number of recommendations to the users that it has predicted using its algorithms.
Note : ‘Individual Interests’ are generally normalized with z-score or mean centering or other techniques to make them comparable. But real world data is too sparse to be normalized effectively.
STEP 1: If our current user buys an ‘item’, recommendations are generated with the help of actual historical data of other people who bought ‘items’ (which will be predicted for the current user) later the same ‘item’ was bought by them.
STEP 2: Recommended items are ranked(if it comes more than once in our recommendations) and sorted.
STEP 3: Already appeared recommendations and offensive ones are removed and they are shortlisted to ’N’ number of items.
The above architecture is just one of the many that can be built for our recommendation System. For example, another architecture can be
as shown below
EVALUATION OF RECOMMENDATION SYSTEM
Just defining what makes a good recommendation system is a HUGE PROBLEM that’s really central to the field.
Building Recommendation Systems are ‘As much as art as science’ as it is difficult to measure how good they actually are (Especially when developing algorithms offline.)
BUILDING A RECOMMENDATION SYSTEM FRAMEWORK
We will not discuss much how to build a recommendation system framework but will give you a simple idea on how to build it. Python code for the very good recommendation system is also available. (Click Here)
- surpriselib is a good library to work on Recommendation systems
- Even though surpriselib is a good package, we will build on top of it so that our algorithms can attain more flexibility.
Note: All the above algorithms inherit from the base class which has basic methods like fit and test. Only a few algorithms are listed here but many are available.
CREATING CUSTOM ALGORITHM (Compatible with AlgoBase)
It is ‘surprise-ingly’ easy! Simply Create a new class that inherits from AlgoBase. Remember that it is ideal but not necessary to evaluate accuracy offline.
The below code shows an example of how you can write a simple estimate function
It is ideal to use a single class for all operations we need to do for evaluation as shown below:
Implementation of the actual framework can be viewed in the Github repository.