Table of Contents :
- Introduction and Recommendation Framework
- Evaluating Recommendation Systems
- Content Based Recommendations
- Neighborhood Based Collaborative Filtering
- User and Item Based Collaborative Filtering
- KNN Recommendations
- Matrix Factorisation
- Deep Learning — Introduction
- Restricted Boltzmann Machines
- Amazon DSSTNE and Sage Maker
- Real-World Challenges and Solutions
Content-based recommendations are the simplest of all approaches. The main idea is to recommend based on the properties of items instead of using aggregate user behavior.
For example, If Bob likes action movies; recommending movies in the same action genre to Bob is called content-based recommendation
Cosine Similarity Metric
How will we know if two movies have similar content?
We can know how similar two items are with metrics such as cosine similarity metrics.
- A cosine similarity metric is ideal for content-based recommendations
- It is used to find similarity of any given pair of movies
Let us understand how cosine similarity works. We can represent a movie-genre matrix in a two-dimensional space as shown below. The x-axis represents the discrete value for comedy attribute and the y-axis represents adventure attribute. We put discrete value ‘1’ for comedy attribute if is movie is full of comedy and ‘0’ if the movie isn’t that funny; Same is done to adventure attribute. We can represent each movie along with genre in the two-dimensional space as shown below. This matrix can be used to find how similar the movies are.
Here, we notice that the formula used for cosine similarity is a good start but finding the angles from the kind of data we have is difficult.
We can solve this problem with the same approach but completely different formula! The below-mentioned formula can be used instead
One thing to note down is that with the increase in the number of factors depending on which we are computing similarities, our matrix dimensions increase or decrease
Cosine similarity can be implemented to genre very easily. The code snippet below does the same.
Other methods( or metrics) are also available which we will discuss in detail in coming sections. Some of them are —
- Euclidean distance: Measures actual distance.
- Pearson Correlation: Similar to the cosine metric but mean values are used.
Similarities Based on Release Years Alone?
Note: You may do some string wrangling and get ‘years’ from data set to recommend with it’s help.
Performing this trick may be termed as one of the Arts for designing a Recommendation Systems. Everything depends on the nature of the data you have. Again we have to decide how far the two movies have to be substantially different?
Chose a mathematical function that smoothly scales timeline into range zero to one and implements it in code. Here, the exponential function does the trick.
Code for Python implementation of recommending based on release years is shown below —
Note: Test as many functions as possible to get best recommendations
HOW TO TURN THESE SIMILARITIES(Genere, Time) BETWEEN MOVIES BASED ON THEIR ATTRIBUTES INTO
ACTUAL RATING PREDICTIONS?
We have to remember that our recommendation algorithms in surpriselib have just ONE job — Predict rating for a given user for a given movie.
K Nearest Neighbours[KNN] Algorithm: Fancy name for a simple idea. It selecting ’N’ number of things that are close to the things you are interested in.
- Step 1: Find similarity scores between the movie and all the movies user has rated
- Step 2: Sort and choose top 40 nearest neighbors of the movie
- Step 3: Take the weighted average and predict the ratings
Note: For Top N recommendations, the relative order of predicted ratings matter. Not predicted Rating itself. If you are really striving for fine-tuning prediction accuracy, there are ways to normalize our predicted ratings to get them into the range we want. For example, Log-Quantile Normalisation does the
trick. BUT NOBODY CARES IN REAL WORLD. ONLY TOP-N ARE CARED ABOUT!!!
mise en scene (Under Research)
The main idea of this algorithm is to extract properties from the film itself which are then quantified and analyzed for recommendations.
- Does not favor accuracy but increases diversity
- Increased diversity may also lead to random stuff recommendation
TIPS TO MAKE BETTER MODELS
- Use popularity rankings as a tie-breaker
- Use “Release Year”
- Try new ideas!
- Always test with A/B Online tests!