Introduction to Recommender Systems: Deal with Overloading Information

Published in

SFU Professional Computer Science

10 min readFeb 4, 2020

This blog is written and maintained by students in the Professional Master’s Program in the School of Computing Science at Simon Fraser University as part of their course credit. To learn more about this unique program, please visit {sfu.ca/computing/pmp}.

Authors: Quan Yuan, Yuheng Liu, Wenlong Wu

Introduction

With the continuous development of Internet technology and the increasing popularity of smart devices, more and more data are generated in current days. Once we get this massive amount of data, the industry-wide personalized recommendation technology becomes easier to implement, whether it’s Amazon or YouTube, or any big tech company is undoubtedly the biggest beneficiary of this area.

Unlike ordinary personal items, smart devices can uniquely link to a specific person, and these personal smart devices are difficult to share with other people, which makes the browsing, transaction, and other behavioral data on your mobile phone play a significant analytical value on the recommender system.

From the perspective of the e-commerce platform, the essential goal of the recommender system is to recommend products that are most likely to be sold to consumers so that the data can be fully utilized. The accuracy becomes higher with the continuous enrichment of personal user data and technical approach.

Methodology

Data Collection

Recommender system uses various information of users and items to obtain a proper recommendation. That means a sufficient dataset is a basis for building an intelligent recommender system. A general dataset contains attributes of users and items and ideally also the interactions of users with items. The following figure shows basic approaches to link a certain user and items.

Several data collection methods and the problem we will meet building the system are mentioned below.

1. Attributes

Both users and items have attributes that provide critical information for the recommender system, such as users’ gender, interests, and descriptions of items. Data mining methods can be used to extract relevant key knowledge from those attributes. By calculating the similarity of the user’s attributes and item’s attributes, we could use the k-nearest neighbors algorithm to obtain the closest item for users. This is a content-based recommender system especially useful when we lack interaction information between users and items. Conversely, it is also a limit. According to the most prevailing recommender system algorithms, interactions of users and items play an essential role.

2. Interaction

To be more specific, interactions refer to the actions and behaviors of users and records of feedback on the server. User behaviors provide lots of useful information. Based on the user behavior, we could tell their recent interests, their concern about a particular item, etc. There are two types of feedback. The explicit one is that users directly provide their attitude toward items such as clicking the “like/dislike” button. However, lots of users leave the page instead of doing that. It turns out that the implicit one, the record of users’ interactions, is more important than it. Suppose a user has read a description of an item up to the end, which means he may have some interest in it. If the user views an item for a few seconds, we can infer this item is not what he wants at that moment. Besides the user’s behavior, the recommendation also depends on time, domain, and other contextual factors.

3. Context information

Context information includes time, location and mood, etc. It is also an important part of the recommender system. For example, the recent interactions are much more important than the past since they represent users’recent needs. For a music recommender system, users would choose different music based on their moods. For an upcoming festival, users would buy some specific decorations. It is like we add a real-time property to the recommender system. Due to the importance of context information, it is important to record users’interactions and update the model frequently to be able to generate new recommendations in real-time.

4. Tag system

Based on using attributes to provide recommendations, the tag system is a useful tool to create a relationship between users’ interests and items and is widely used on some movie-related and video-related websites. On the one hand, the tag describes the user’s interest. On the other hand, it is a brief attribute of the item. Some websites prefer to provide several related tags for users instead of letting users improvise in order to improve the quality of tags.

5. Cold start problem

Cold start is a common problem in the recommender system caused by the lack of user’s or item’s information. When a new user or a new item is added to the platform, how does the recommender system do without data? It turns out several typical solutions for obtaining original information. You definitely have ever used authorized login to create a new account on a website. As long as signing up with your other social networking account, the system can obtain some of your records from it and extract some useful information as data used in the recommender system. Some websites ask a new user to select some tags intriguing him/her at the beginning. Then, the system will recommend some popular items based on the user’s selections as the appetizer. Random strategy is also used as a solution to the “cold start” problem. Recommending random items to new users or new items to random users can give the recommender system positive feedback based on the user’s interaction with the item.

Algorithms:

1. Collaborative Filtering

Generally, Collaborative Filtering makes predictions based on the past experience of the user. The past experience reflects the user’s preference. Reviews, ratings, number of clicks, duration of browsing, choice of items, and many other factors could indicate what the user may be interested in and be willing to spend time on. There are mainly two types of Collaborative Filtering, which are user-based and item-based.

User-Based:
By collecting the user information and items that users are interested in, the system allocates users with similar preferences in the same group. Suppose user A and user B are frequent users of Netflix, and they have similar preferences in movies. If Titanic is in B’s watching history and A has not watched it, the movie Titanic will be recommended to B as they are determined to have similar tastes according to their experience.

Item-Based:
Based on a large amount of data, the system compares the items that may be preferred by similar types of users and determine whether these items are similar or not. The system lists the preferences of each user and recommends similar items to the user.

As mentioned above, both these two types of Collaborative Filtering are based on a large amount of data. The experience of the user and the information of items play an important role in this algorithm.

By comparing the similarity among users and items, the system links potential preferred items to the users. To calculate the similarity, Euclidean Distance is the most direct approach. The smaller the result is, the higher the correlation between two items. However, a negative correlation is hard to be detected in this method. If two items are highly negatively correlated, the result of Euclidean Distance will still be large. Based on this, Pearson Correlation is a better solution as it covers both positive and negative correlations. In addition to this, cosine similarity is another effective approach, which illustrates the ratings in terms of vectors and calculates the cosine value between two vectors to find the similarity.

Collaborative Filtering involves an important machine learning algorithm — — K-Means Clustering. It classifies data points into different clusters so that data points in the same cluster have more similarities than others. First, it randomly plots several data points on the plain. Then, It implements the following steps iteratively until the centroids don’t change.

a) Assign the data points to the nearest centroid and thus construct clusters centered at those centroids

b) Find the center point in each cluster and re-define it to be the new centroid

The following graph shows how this algorithm is implemented when k=2.

2. Content-based

Unlike Collaborative Filtering, Content-based is focused on the contents of the item. It involves data preprocessing and feature extraction. For each item, the system builds a vector containing the extracted features. Also, the system builds a profile for each user based on his experience and preference. Thus, this item will be recommended to the user who is interested in the item with similar features. It might be confusing what the difference between Collaborative Filtering Item-based and Content-based is. For the Item-based algorithm, it is focused on the user’s ratings on the items rather than what contents this user is interested in. For the Content-based algorithm, it concentrates on the common features between the user profile and item profile. As this algorithm clearly lists the features of the item, it can better explain why it is recommended, and it may have a higher chance of meeting the user’s taste. However, extracting features is difficult, as it is hard to define which features are typical and important to the item.

Application

Here are some examples of the recommender system applied to real applications.

1. Amazon

Amazon is a multinational technology company that focuses on e-commerce. G. Linden, B. Smith, and J. York explained Amazon’s Item-to-Item Collaborative Filtering in their paper.

As shown above, the Item-to-Item Collaborative Filtering algorithm can provide customers with product recommendations based on the items in their shopping cart. This feature is similar to impulse goods at the checkout of the supermarket, but Amazon’s impulse goods are personalized for each customer.
In fact, Item-to-Item Collaborative Filtering is item-based Collaborative Filtering. It put all purchased and rated items to a recommendation list instead of matching the user to similar customers.

2. TikTok

TikTok, a video-sharing social networking service. With its advanced and high-accuracy recommendation algorithm, it can make users spend hours on its application a day. In research conducted by Jiamin and Leipeng, they try to reveal TikTok’s mysterious algorithms by introducing feature engineering.

The purpose of feature engineering is to select better features for generating better training data. The common methods to select features are Pearson correlation coefficient, mutual information, distance correlation. By using these similarity tests, the most useful feature can be selected and improve the outcome. For example, users click to see some travel-related videos; the system will record this feature of the user, and look for this feature from later video features (the creator, the music ID, the gender of the character in the video) to ensure there is a high probability that the following videos will also be travel-related.

Conclusion:

The recommender system is a creative solution to deal with overloading information as well as the search engine. The difference between them is that the search engine needs users to have a specific target. However, the recommender system aims to solve the situation that users have no idea about what they want at that moment. Based on the user’s interests and past behavior, the recommender system is able to provide items to the user automatically. The article briefly introduced some data collection methods and prevailing algorithms used in this domain. A few practical applications were given to interpret the deployment of the recommender system in the real world. Nevertheless, all we talked about is just a small piece of this significant topic. Generally, the different company has their recommender system framework base on their business. It’s not the simple combination of the mentioned methods. The recommender system is more like a free domain that you could come up with various innovative ideas to explore the link among users and items.

Thanks for reading! 🤗🤗🤗

References:

H. Lang, “Collaborative Filtering”, https://blog.csdn.net/hlang8160/article/details/81433356, 2018
Z. Shen, “Collaborative Filter — Data Mining”, https://blog.csdn.net/shenziheng1/article/details/89813959, 2019
Pavel Kordík, “Recommender systems explained”, https://medium.com/recombee-blog/recommender-systems-explained-d98e8221f468
Baptiste Rocca, “Introduction to recommender systems”, https://towardsdatascience.com/introduction-to-recommender-systems-6c66cf15ada
J. Cheng, Z. Li, L. Wang and Q. Bian, “Practice of a New Model Fusion Structure in Short Video Recommendation,” 2019 International Conference on Virtual Reality and Intelligent Systems (ICVRIS), Jishou, China, 2019, pp. 27–30.
G. Linden, B. Smith and J. York, “Amazon.com recommendations: item-to-item collaborative filtering,” in IEEE Internet Computing, vol. 7, no. 1, pp. 76–80, Jan.-Feb. 2003.
J. Tanghulu, “Recommender System: Content-based&Collaborative Filtering”, https://blog.csdn.net/yinyu19950811/article/details/85697227, 2019