Recommendation Systems in Machine Learning
Originally published on ZeoLearn Blog.
What is that?
Today, we are facing a very rapid growth in the volume and structure of the Internet. Users are more often found to be lost in this complex and messy environment of websites due to their complex structure and large amounts of information. So personalizing and simplifying the web is more important than ever before for users and owners of e-commerce websites.
One of the most important aspects of web personalization is the Recommendation system. Because the system is in the midst of a huge amount of information or products, the user gives suggestions that he likes or needs.In general, Recommendation systems are referred to as systems and tools that provide suggestions for the items the user uses.These suggestions can be product, page, news, user-friendly or even advertised. The algorithm and model of the computing and processing of the Recommendation system is more robust and comprehensive، The degree of satisfaction of the user from the recommended items, and generally from the site, goes higher and finally e-commerce sites increase sales of products.
Who is using them?
Today, it plays a very important role in sites that have a lot of hits, users or products, in the fields of entertainment, content-based, e-commerce, advertising and social networks, etc., such as Netflix, youtube, amazon,lastfm, imdb, Yahoo, Spotify and so on.
Also, these suggestions are placed in specific sections of the site to draw the user’s attention. This shows the importance of these types of systems.
The basis of these systems is ِMachine Learning and Data Mining. All the approaches have their roots in information retrieval and information filtering research. Most methods use the techniques of these two sciences such as Bayesian Classifiers, cluster analysis, decision trees, and artificial neural networks in order to estimate the probability that the user is going to like the item.
The inputs to these systems are item’s data, user profile and most importantly, the behavior of each user’s access to the items, which is referred to as Usage. System outputs is a collection of products and items that the user will mostly like or buy.
Generally, Recommendation systems work in two basic ways: Content-based and Collaborating Filtering.
In the Content-based methods, the basis is the analysis of the content and characteristics of each item with the user’s characteristics and information.For example, the system first examines the features of the items. In the next step, it takes the user’s interests and needs based on information that the user himself or herself has put directly or indirectly, such as purchasing, visiting, or rating certain products, and finally, based on a similarity or demand-side model, it offers items to the user.
In the Collaborative Filtering method, at first, users similar to the current users need to be found, and then the items the current users have liked will be offered to them. In fact, it is predicting what users will like based on their similarity to other users.The similarity of users and how to find them can vary depending on the algorithm and method used.
“people who agreed in the past will agree in the future, and that they will like similar kinds of items as they liked in the past.”
Collaborative Filtering as the popular way!
One of the most important features of CF is that we do not need to analyze the features of the items, in fact, the system does not need to know the item’s species and is trying to discover the relationship of items with other items and users.This method is more prevalent due to higher efficiency and accuracy. Different algorithms are presented based on this method.
There are two types of Collaborative Filtering-
- Latent factor models
Finding neighbors! These methods focus on computing similarity of items and users. This category also has two subtypes- User-based and item-based split. In these methods, the system calculates the similarity between users and/or items.
Let’s see Yehuda Koren example:
Since in these methods often the whole data should be retrieved to calculate similarity, they are known as “memory-based” methods.
Latent factor models
Modeling users and Items!.This type often uses the Decomposition Value Singular (SVD) method to solve its problems.SVD transfer users and items in a multi-dimensional common space and then compare them together. As per Yehuda Koren in the below example, using SVD, as well as each user’s rate for movie films, we transferred ( or model ) users and products to a two-dimensional space for simplicity and convenience. As you can see, Dave and Gus are more similar, also Braveheart and Weapon are similar.
SVD methods are based on Matrix factorization. Because the system input is a matrix whose columns are users and rows are Items, their values are the percentages of users’ points of the item. Lets clear it by Ilya Grigorikexample. we assume our user-rating matrix is like below:
In the SVD algorithms, we could factor our matrix like R=U.S.V’. So according to it, we have the equivalation:
Now here we can consider the U matrix as a product that has X, Y in the 2D dimension (we assume S has 2 row), as well as the matrix V as users.
As shown in the figure, the users who are similar in terms of scoring are closer together, such as ben and Fred. This also applies to products. Also, one of the important issues of this algorithm is reducing dimensions to prevent the problem of Sparsity and Scalability. The dimensions are controlled by a number of row of S matrix and this number it is computable.
There are a lot of challenges in implementing and managing this system. We will focus on some of the most important ones here-
In systems with a lot of items, products and users that users do not want to participate in rating items or collect information about their interests and tendencies by the system is very little reason, so lack of data causes a lot of carelessness In the Recommendation.
This problem occurs when the new user or item is added to the system. In this mode, the new user didn’t receive the correct Recommended items, on the other hand, the new item or product is not recommended to anyone.
The higher variability in the proposed list, the higher probability of a user choice. It should be noted that this diversity should not bring down the system’s accuracy.
The scalability of current algorithms has become one of the most important challenges of the proposing systems with today’s Huge information and Database like BigData. The current Data growth, such as the number of products, users, and user interactions such as comments, rate, etc., are very fast.
Growth and expansion and current algorithms are evaluated on a much more limited scale and lose efficiency at large scales.
Scalability as a major problem
In CF methods, the calculation of similarity between users is very heavy because the similarity between total users and items must be calculated. On the other hand, with the growth of users and items, this algorithm is linearly incrementally processed. So most algorithms of CF is slowed down by increasing user and items and will require a lot of processing power and memory resource.
One of the most effective ways to solve this problem is to use parallel processing methods such as MapReduce.
It’s best to choose a method based on your parameters and domain and implement it with your favorite language, but there are several open source project that you could use. For example, raccoon is Node.js library that implements CF Recommendation systems via Redis.
For Java, there is librec with a lot of implemented algorithms.
for python, there is surpriselib, a scikit for building, and analyzing (collaborative filtering) recommender systems.
On the Internet, where the number of choices is increasing, there is a need for a filter to redraw information based on interest and how useful they are. It has become essential for users so that its absence will result in a significant drop in the quality of the service as well as a reduction in user satisfaction.The Recommendation systems covered this problem by searching for and mining the mass of information. These systems are now an important part of the store, news, Social media, movies and music, books and search engines sites.So start using them for your own site…