Use Machine Learning for building Recommendations systems.
Recommendations are all over Internet from social media to shopping sites that are tailored to show products/posts/songs/movies and everything that a user likes. Your news feed isn’t a news paper but one curated to show only news that you may like. 80% of the time a user on Netflix chooses to watch what is recommended by their algorithms.
This post will describe how a recommendation system works.
Lets try to understand how the K-nearest neighbor recommendation algorithm works.
1. Similarity is the key.
For most of the recommendation systems one needs to find similarity between user-user, user-product. This is closely related to real life, say where one asks for movie suggestions to a friend. Basically one seeks suggestions from a person who shares similar tastes in movies. You would not ask a friend who generally watches romcom’s about drama. Similarly in the world of algorithms we tend to find similar users who share the same taste from a large pool of users. Say we have a movie-ratings dataset; to find similar users we need to find users who tend to give similar ratings for particular types of movies. There are many mathematical ways to find similarity like, Euclidean distance, Pearson correlation coefficient, Hamming distance etc. Which one to select depends on the kind of problem you are solving and the type of data you have. If you have data that is binary, one would go for Hamming distance.
2. Finding nearest neighbors.
Once we find how similar is every user to the active user (User for which we need recommendations) we sort it to find k nearest neighbor to our active user. K is just an integer. The choice of k is very critical. Smaller value of k will mean that noise will have higher influence whereas higher value will make it computationally expensive. Sometimes it is best to run through each possible value for k and decide for yourself.
3. Calculate average rating of nearest neighbors for unrated items.
One could find similar users and lookup their articles which the active user haven’t rated and recommend them. This bring randomness in the recommendations even tough we found similar users. Our goal is to minimize this randomness for better results. With this approach some as their is no order in which articles are recommended, some articles which have been rated by most of the nearest neighbors might fall in the bottom of the list. Hence a more calculative approach can be adopted. To keep it simple we will take the arithmetic mean (laymen’s average) of rating of nearest neighbors for items that hasn’t been rated by active users. One could also take a weighted average. After doing this we will get a mean for all articles which the active users hasn’t rated and sort it descending to get top N recommendations.