Introduction to Recommendation Systems

Published in

Illinois Tech ACM

9 min readMar 26, 2019

Have you ever wondered how Amazon recommends products to you?

Imagine just finishing your order for that Apple watch you’ve been saving money for and suddenly Amazon recommends you a good looking band that you didn’t know you wanted but are now suddenly inclined to buy. Or perhaps, you just finished googling about the top ten cases for your new iPhone XR and suddenly once you hop into the site the same cases seem to appear on your homepage.

Creepy? Maybe. Magic? Not really. Companies like Amazon, Spotify, Netflix, and even Facebook do this by using Recommendation Systems to help users discover items that they might like or are more likely to purchase.

Types of Recommendation Systems

There are two types of Recommendation Systems that are used in practice. Note: We will be using the company Cameo (a company that helps users buy personalized videos from celebrities) in our examples.

Content-Based Systems: These systems tend to look at the characteristics or attributes of items in order to compare them to items that a person or user has purchased.

We notice that a user likes a certain type of celebrity and recommend similar celebrities!

For example, if we were implementing Content-Based Systems in Cameo we would want to recommend Cameo’s users celebrities that they might like to purchase a Cameo from. Let us imagine for a second that you are one of the users. Using this approach we would want to compare the celebrities that you have purchased in the past with celebrities Cameo has available. This could mean comparing the movies these celebrities appeared in, the genres they categorize themselves in, or even the average ratings they are given by customers. The more similar the celebrities are then the more likely Cameo will recommend these celebrities to you.

Users that rated the same things highly might have other things in common!

Collaborative Filtering Systems: These systems focus more on the what the users have purchased, watched, or liked/rated in the past. They will compare similar users to each other or perhaps compare items rated similarly in order to make recommendations.

For example, Cameo might use this approach by finding users that have purchased similar celebrities that you have. They might then notice that these other users have also bought videos from celebrities like Kendrick Lamar and Taylor swift but you didn’t. Therefore, Cameo might recommend you these two celebrities.

Representing the Information

Before we start diving deeper into recommendation systems it’s important to understand how the data is organized and used.

The Utility Matrix: This matrix is very important since it contains the relationship between the users and the items. The rows represents each user and the columns each item. The entries in the matrix can be binary (1 or 0) which may represent whether a user purchased an item or not. We can also have the entries represent a rating (ex: 1–10) which would represent the rating a user gave to a particular item.

Note: The zeros in the matrix can also represent “blank” entries or in our example unrated/undiscovered celebrities. These “blank” entries are often what we want to predict or decide if a user will like the item these blank entries belong to.

Utility Matrix with Binary entries for three users and three artists

Item Profile: This is used more in Content-Based Filtering. However, the item Profile is another matrix that contains rows for each item available and columns for the attributes or characteristics that can describe the items. In the case of Cameo, this matrix can give us information like the genre a certain celebrity belongs to or the hometown of a celebrity.

Item Profiles for these three celebrities

Notice how the entries in the Item Profile vary from numbers to words. This is important because we often would like every matrix to be uniform to one type in order to compare them with each other. There are multiple ways to solve this but I would leave you with your curiosity to uncover and learn about them.

User Profiles: This again is used more in Content-Based Filtering. This matrix represents the user’s preferences to attributes of an item. Similarly to the Utility Matrix, It’s rows will represents each user but the columns represent the attributes of an item instead of the items themselves. This matrix can tell us information such as what genre does a user prefer in celebrities.

The above tells us that 20% of the celebrities User 1 are in the rap genre and 30% of them are in the latin genre. This information would be compared to that of the item profiles in order to see if there are any celebrities that match those attributes that User 1 likes.

Similarity Measures

So we’ve talked a lot about similar customers and similar items. But how exactly do we measure how similar these things are to eachother?

Well first, we treat each row (or sometimes columns) as a vector. For example, in the case of the Utility Matrix each vector would represent the values of each column for a user.

Representing a utility matrix with two users and two celebrities

Second, we choose a similarity measure to compare the vectors. Similarity measures determine how similar vectors are by comparing the number of common items, the distances between the vectors, or even the angle that these vectors make.

http://techinpink.com/2017/08/04/implementing-similarity-measures-cosine-similarity-versus-jaccard-similarity/

There are plenty of similarity measures to choose from but the most popular ones are Jaccard and Cosine similarity.

One of the reason these two similarity measures are often used is because they deal with the problem of having very sparse data. For example, because Cameo has many celebrities and people often haven’t purchased or even know about many of them their will be many 0s in our utility matrix. Therefore, the 0s in our utility matrix are not as important because they do not provide us with much information and so we want to focus more on what people have bought.

It is also important to note that although these similarity measure deal with a similar problem they do not do it in the same way and can give very different results. In fact, you can say that each similarity measurement that exists has their very own definition of “similarity”. Therefore, it is always important to understand what similarity measures you are using in your recommendation system or be ready to test multiple to see if any give improved results.

Evaluating our Recommender System

So now that we’ve built our awesome recommender system we need a way to measure how “correct” it is. The way we determine how good our system is will depend on how we approach our recommendation.

We can treat our recommendation as a as a ranking task or a rating task. In other words, are we recommending a set of items for each user that we think are the most important (top-K) or are we predicting the rating that a user will give to a certain item.

Treating a recommendation as a rating task requires that you have data or a utility matrix where users have given some sort of ratings to items. An example would be something like movie ratings or product ratings. This will require a different type of scoring metric than if we were to do a ranking task because we want to make sure that the correct ratings are given to the items. In other words treating it like a rating task is like treating it as a prediction problem where your goal is to predict the rating. The preferred metric for this is often RMSE or MAE.

Now, when we treat our recommendation as a ranking task we often talk about Top-N recommender systems. These are systems that return recommendations in order of what the user will most likely buy or like.

Amazon uses a Top-N recommender system to recommend items you are most likely to buy

One simple way of evaluating Top-N recommender systems is by simply looking at the hit rate. For each user, how many “hits” or correct recommendations did we make. We would usually take some average and also sometimes add weights where the first recommendations are more important then the last.

However, the most popular ranking metrics are MAP and NDCG. The use of a metric will depend on the application of your recommendation system and the data you are working with.

The MAP metric is often used when you only care about whether your recommendations are relevant or irrelvant to the user. MAP computes the mean of the Average Precision over all your users. This means that for each user you take the list or set of recommendations you gave it and compare it to a correct set. Average Precision not only rewards the fact that you got a “hit” or correct recommendation but also if you placed the more correct recommendations at the top of your set or list. You then add all the those AP scores and divide by the total number of users.

Overall MAP has three main characteristics:

Rewards successful recommendations
Rewards recommendations that are front-loading. Meaning correct recommendations that are placed at the top.
Does not penalize you if you add non-matching recommendations to your list. However, it prefers when you front-load the correct recommendations

Note: We did not talk about NDCG or RMSE in depth but it is important to know that these are not useful when dealing with binary data (0 or 1). These metrics are often used for non-binary data where theres a concept of relevance scores or ratings. Therefore, these metrics often use averages or exponential/logarithmic functions to compute the evaluations. These techniques don’t make sence on binary data because they do not give any useful information about it.

Drawbacks

Recommender Systems sound great! But they are not perfect. These systems can be very useful tools to enhance the customer experience and increase sales for a company because they lend themselves tom customization. However, they often have many drawbacks and may not be suitable for every situation.

One of the drawbacks comes from the fact that these systems require lots of data. What happens when you have new users? What if you are a new company? What if you don’t require user accounts? This is often known as cold start and can be a big problem when dealing with recommender systems. Often it might be the case that a recommender system may not be a great option or you may need to come up with novel solutions to solve the lack of data such as using cache data from your users or asking your users initial questions.

Another big drawback of recommender systems is that they often don’t work well with incredibly sparse data. In other words, if there are many items to be recommended, users are completly different, or there is a lot of irrelevant item attributes then it may be hard for the system to find similar vectors or patterns.

Furthermore, it can be computationally expensive in practice. If you’re Amazon and you have millions of users then comparing each user vector becomes a huge problem especially since these vectors are always changing. You may even run into some slow performance when implementing a recommendation system on data with only 1000 rows.

Pondering these issues and looking at other ways recommendation systems can go wrong is important. Therefore, I challenge you to think about other problems with recommendation systems. When would recommendation systems not be good idea? When is one type of recommendation system better than the other?

Exploring More

We have only scratched the surface of recommendation systems. There are many important topics that we have yet to explore.

As mentioned before we did not discuss other similarity metrics or even specific differences between Jaccard and Cosine. Understanding these metrics are crucial since they may in fact give you different recommendations and some are more appropriate in different domains then others.

We have also yet to talk about how one can use Machine Learning methods such as decision trees and clustering. Furthermore, there are also ways we can use Matrix Factorization and Dimensionality Reduction using things like SVD (singular-value decomposition) to take care of all the unimportant information such as 0s or un-discovered items. These methods can sometimes improve our systems and are interesting topics to learn about.

Below are just a couple of resources for you to check out:

Amazon Item-to-Item Collaborative Filtering

Wikipedia’s Recommender Systems

The BellKor Solution to the Netflix Grand Prize