Recommendation Systems in Machine Learning

Published in

DataX Journal

9 min readNov 3, 2020

How does YouTube know what videos you’ll watch? How does Netflix always suggest the best of your interests? Let’s find out!

What is the Recommendation System?

According to Wikipedia:

Recommender systems or recommendation systems (sometimes replacing “system” with a synonym such as a platform or an engine) are a subclass of information filtering systems that seeks to predict the ‘rating’ or ‘preference’ that the user would give to an item.

Recommender systems are one of the most popular algorithms in data science today. They possess immense capability in various sectors ranging from entertainment to e-commerce. E-commerce companies and content providers use recommender systems to suggest related products or content to users. It has proven to be instrumental in pushing up company revenues and customer satisfaction with their implementation. Therefore, machine learning enthusiasts need to get a grasp on it and get familiar with related concepts.

The system deals with a large volume of information present by filtering the most important information based on the data provided by a user and other factors that take care of the user’s preference and interest. It finds out the match between user and item and imputes the similarities between users and items for recommendation. Items are ranked according to their relevancy, and the most relevant ones are shown to the user. The relevancy is something that the recommender system must determine and is mainly based on historical data. (If you’ve recently watched YouTube videos about KPop 🕺, then YouTube is going to start showing you a lot of KPop videos with similar titles and themes!)

Both the users and the services provided have benefited from these kinds of systems. The quality and decision-making process has also improved through these kinds of systems.

Why the Recommendation System?

1. Benefits users in finding items of their interest.
2. Help item providers in delivering their items to the right user.
3. Identity products that are most relevant to users.
4. Personalized content.
5. Help websites to improve user engagement.

Recommender systems are generally divided into two main categories: collaborative filtering and content-based systems.

Collaborative Filtering Systems

It’s the most sought-after, most widely implemented, and most mature technologies that are available in the market. The key concept in collaborative filtering methods is that they are collaborative, i.e. they leverage other user’s ratings. If you are trying to guess whether or not you will like a certain movie, you might ask people with similar taste what they thought of that movie. You also might ask these people what other movies they liked and gather a list of recommendations.

It aggregates ratings or recommendations of objects, recognizes commonalities between the users based on their ratings, and generates new recommendations based on inter-user comparisons.

There are two ways to approach collaborative filtering:

Memory-based or neighborhood-based methods are a simple way of harnessing the ratings of other users to predict ratings. They themselves come in two varieties: user-based and item-based.
Model-based methods take collaborative filtering a step farther and use machine learning and probabilistic models like decision trees, latent-factor models, and neural networks as classification black boxes.

User to user or Item to item

It collects behavior information on how you interact with items — what you rank, how you rank, what you view, and/or what you purchase. One typical approach is to create a user rating matrix containing the users’ ratings on movies. We find similarities and make recommendations. The similarity is not restricted to the taste of the user moreover there can be consideration of similarity between different items also. The system will give more efficient recommendations if we have a large volume of information about users and items.

In the User-based collaborative filtering above there are three users Tim, Amy, and John with their interest in desserts. The system finds out the users who have the same sort of taste of purchasing products and similarity between users is computed based upon the purchase behavior. Tim and John are similar because they have purchased similar products.
In the Item-based Collaborative filtering, The system checks the items that are similar to the items the user bought. The similarity between different items is computed based on the items and not the users for the prediction. Tim and Amy both purchased sundae 🍧 and ice cream 🍦 so they are found to have similar tastes.

Matrix factorization

Singular value decomposition also known as the SVD algorithm is used as a collaborative filtering method in recommendation systems. SVD is a matrix factorization method that is used to reduce the features in the data by reducing the dimensions from N to K where (K<N). It decomposes data into independent principal components like the one below.

In a nutshell, from a rating matrix, we learn the latent factors that users use in rating movies. For example, these latent factors may be the genre of the movie, the year of the release, the actor or the actress in the movie. But we don’t define it explicitly. We do not know what the latent factors are learned unless we examine the results manually. We let the computer to learn them from the data, just like other ML methods.

Matrix-factorization is all about taking 2 matrices whose product is the original matrix. Vectors are used to represent item ‘qi’ and user ‘pu’ such that their dot product is the expected rating.

‘qi’ and ‘pu’ can be calculated in such a way that the square error difference between the dot product of user and item and the original ratings in the user-item matrix is least.

With the concept of SVD, any user rating matrix can be decomposed into matrix P containing user latent factor and matrix Q containing item latent factor. The movie rating from a user can be reconstructed by multiplying the corresponding latent factor of the user and the movie. But how can we find the matrices if there are missing entries in the user rating matrix? Users rate or interact with a handful of movies only. To solve that, we apply SGD to learn the decomposed matrix Z and W.

Minimizing with Stochastic Gradient Descent (SGD): SGD functions by taking the parameters of the equation which we are trying to reduce to initial values and then iterating it to minimize the incorrect error between the actual value & the predicted value by making the use of a small factor each time to correct. It makes the usage of the learning rate to check about the previous values and the new value after every other iteration.

Content-based filtering

It is another type of recommendation system which works on the principle of similar content. If a user is watching a movie, then the system will check about other movies of similar content or the same genre of the movie the user is watching. In contrast to collaborative filtering, content-based approaches will use additional information about the user and/or items to make predictions. They utilize descriptive keywords associated with each item to make recommendations. This is quite useful because the only rating history we need to make predictions is that of the target user. We can extract features from a movie description and match what a user prefers.

For example, Mike has given good ratings to a movie like Fast and Furious which is tagged as an Action and a crime Genre and gave a bad rating to the movie Avatar which is tagged as an adventure and a fantasy Genre. Information about a movie can be extracted from the movie title, descriptions, and information provided by the studio. Hence movies like James Bond and Mission Impossible will be recommended to Mike and movies like Jumanji and Dolittle will be recommended to Kate who prefers a fantasy genre.

What are the concepts used in Content-Based filtering?

The concepts of Term Frequency (TF) and Inverse Document Frequency (IDF) are used in information retrieval systems and also content-based filtering mechanisms (such as a content-based recommender). They are used to determine the relative importance of a document/article/news item/movie etc.

TF is simply the frequency of a word in a document. IDF is the inverse of the document frequency among the whole corpus of documents. TF-IDF is used mainly because of two reasons: Suppose we search for “the rise of analytics” on Google. It is certain that “the” will occur more frequently than “analytics” but the relative importance of analytics is higher than the search query point of view. In such cases, TF-IDF weighting negates the effect of high-frequency words in determining the importance of an item (document).

There are two main sources of information for content-based recommender systems. The first is textual item descriptions, usually posted by the manufacturer. They describe attributes relating to the content of the item. The second is the user profile. It contains ratings from the user, either explicit or implicit.

There are different scenarios where we need to check about the similarities, so there are different metrics to be used. For computing the similarity between numeric data, Euclidean distance is used, for textual data, cosine similarity is calculated and for categorical data, Jaccard similarity is computed.

Cosine Similarity: Cosine of the angle between the two vectors of the item, vectors of A and B is calculated for imputing similarity. If the vectors are closer, then small will be the angle and large will be the cosine.

Jaccard Similarity: Users who have rated item A and B divided by the total number of users who have rated either A or B gives us the similarity. It is used for comparing the similarity.

Merits

There is no requirement for much of the user’s data.
We just need item data that enable us to start giving recommendations to users.
A content-based recommender engine does not depend on the user’s data, so even if a new user comes in, we can recommend the user as long as we have the user data to build his profile.
As opposed to collaborative filtering, new items can be suggested before being rated by a substantial number of users.

Demerits

Items data should be in good volume.
Features should be available to compute the similarity.
Lack of diversity of recommendations
Inability to solve the cold start problem of new users.

Conclusion

On the Internet, where the number of choices is overwhelming, there is a need to filter, prioritize and efficiently deliver relevant information to alleviate the problem of information overload, which has created a potential problem for many Internet users. Recommender Systems solve this problem by searching through a large volume of dynamically generated information to provide users with personalized content and services.

Recommender System already drives almost every aspect of our daily lives. It opens new opportunities for retrieving personalized information on the Internet. It also helps to alleviate the problem of information overload which is a very common phenomenon and enables users to have access to products and services which are not readily available to users on the system.

Thanks for reading!! If you like it, Do leave a clap 👏

Also, If you are on LinkedIn, just click my profile page below and shoot me a request!!

Ankita K. - Machine Learning Developer - Data Science Community SRM | LinkedIn

View Ankita K.'s profile on LinkedIn, the world's largest professional community. Ankita has 4 jobs listed on their…

www.linkedin.com

Looking forward to connecting!!