Understanding Recommendation Systems

NR
Personal Project
6 min readDec 1, 2019

--

A recommendation system is a nifty technique to embrace the personalizing trend on the items that you offer to your users.

Recommender what…?

To put it simply: a system that can recommend “stuff” to people based on what everybody else did. It’s like the data analysis translation of a trend. The main examples of these systems are: “People who bought also bought”, “Combine with…”,

Most of the time it will show items that you might be interested in purchasing or viewing based on your previous behavior on the website such as things you rated, voted, clicked on or even spend a lot of time watching. The thing you’re doing right now is for websites the strongest signal of your interest and they will try to enforce that behavior by suggesting other, similar items.

That sounds pretty invading, right?

Recommender systems can be viewed upon as bad or as an invasion of your privacy but it’s not all bad. You can also look at recommender systems as a system that easily helps you to discover new items, music, artists, …

Some special terms that will brighten your mind

Items

These are the entities that a system recommends. The items can be movies, restaurants, books, clothes, …

Queries

The context that is needed. There are 2 important queries:

  • User Query: all the information you have about the user such as likes, preferences, …
  • Additional Query: the additional information that can influence a decision such as budget, timing, devices, …

Embedding

Embedding is picking a point in a higher dimension space and placing it in a low-dimension.

There are two main techniques for recommender systems: User-Based (Collaborative Filtering) and Item-Based (Content Based).

User-Based (Collaborative Filtering)

This technique is based upon your past behavior.

User-Based Collaborative Filtering is made out of a matrix that consists of rows with every user that ever interacted with an item and all their information. That information is put into the columns matching the correct item and can consist out of what they bought, viewed, clicked on, rated, etc. Whatever signal you want to have to build your system around. After putting all the data in the matrix, you will need to compute the likeness between different users.

An easy way for doing this is treating each row as a vector and then compute the similarity between each vector of users. When computed, you’ll need to sort the users by likeness score. This makes it easy to find users similar to you based on their past behavior and recommend you the stuff they liked.

An example with pizza

Marie visited the same Italian restaurant Pizza’s Heaven and rated it 5 stars but also likes to eat at Luigi’s Dinner, also 5 stars.

Bob went to Pizza’s Heaven and also rated it 5 stars but doesn’t know about Luigi’s Dinner.

Because they both gave Pizza’s Heaven 5 stars, we can assume that their similarity score is probably the same. After having this information, we can use it to find recommendations. For example, Bob didn't go to Luigi’s Dinner but we can assume that he would like it given that Marie liked it. So we suggest Luigi’s Dinner for Bob.

Negative Points of User-Based Collaborative Filtering

People’s taste changes. This is one of the main disadvantages when using User-Based CF. People’s thoughts and findings are not trustworthy enough to build a whole system around. For example, maybe you really liked sushi for a long time but the last time you ate it you became really sick thanks to food poisoning. So you will be hesitant in the future for eating sushi. The system doesn’t know this.

Another disadvantage are shilling attacks. It is “easy” to game up the system. You can’t just assume that every one of us is purely good. If you make a system based on ratings given to items, people can easily come between and mess up the rating either positive or negative. For example, you just opened your new sandwich shop and want to attract a lot of customers. What do you do? You write some excellent reviews with fake personas on different sites about your shop and some not-so-quiet good ones about the sandwich shop across town. This is called shilling attacks. It’s a real economic incentive.

The fact that usually there are a lot more users than there are items is also a pain point. A lot of users mean a lot of work for computing the right vectors and comparing them. It can be a time-wrecking process.

Item-Based (Content Based)

This technique is more powerful and can be resistant to some of the negative points that User-Based CF had. It’s a technique that Amazon founded in 1998 and uses for its famously known recommendation system.

Instead of basing our system on relationships between people, we base our system on the items and their relationships. Some advantages of this technique are:

  • The items won’t fickle: the main objective of it will always stay the same such as pants stay pants.
  • There are usually fewer items than people so less computation
  • It’s harder to swindle the system

Item Based Filtering uses rating distributions per item, not per user (User-Based CF). This leads to more stable rating distributions in the model.

How it works

The first phase of the system is executing a model-building stage by finding the similarity between all pairs of items. After that, the system executes a recommendation stage: thanks to previous ratings to other products, it can now generate a list of recommendations that are evenly rated.

There are many ways of using Item Based Filtering, another is:

Find every pair of restaurants that are visited by the same person. We then look for every person that went to both of these restaurants. We compare both of the ratings that the restaurants had of all these people. Thanks to this we can compute the similarities between two different restaurants based on the rating of the people who dined at both of those restaurants. And if those ratings are slightly the same, we can say that the restaurants are similar because they were rated almost the same thanks to people who dined at both of them. After computing, you can sort everything by restaurant and then by similarity strength of all the similar restaurants. This is, in short, the technique for “people who ate at … also ate at …”

Example of item-based filtering

Another example with pizza’s

We have Marie who liked both restaurants Pizza Heaven and Luigi’s Dinner because she rated both of them. Bob also liked and rated both restaurants. At this point, we can say there’s a similarity between the restaurants based on Marie and Bob’s ratings. Next, we look at each pair of restaurants and all the users that rated both of them (Marie and Bob in this case). If Marie as-well-as Bob liked or disliked them, we can say they’re similar to each other. Thanks to their behavior we can speak of a relationship between both of the restaurants. We also have Leah the Fire-Fighter who rated Luigi’s Dinner but not Pizza Heaven and doesn’t know of it.

Marie and Bob like both of the restaurants, Leah only knows about Luigi’s Dinner.

But thanks to the relationship between both of the restaurants, based on the behavior of both Marie and Bob, we can now recommend Pizza Heaven to Leah.

We can now recommend Pizza Heaven to Leah thanks to the relationship between both of the restaurants.

Both of the techniques give similar results but on a different base (items or users-relationships) but still centralized around the behavior of all the people who watched/visited/voted/… them.

So that captures it! I hope this helps anyone who wants to learn more about recommendation systems. If you want to know more about these systems I recommend (pun intended) you following this course on Udumy. It really helped in giving my first introduction to Machine Learning and Recommendation Systems.

Make sure to follow me for more problems and solutions I come across within React Native and Machine Learning!

--

--

NR
Personal Project

Trying to figure things out while writing about it. Pixel-perfect friend, front-end developer and anything data-related geek