Ghost in the Machine Learning: Recommender Systems

Rodrigo Sousa Coutinho
OutSystems Engineering
3 min readNov 18, 2016

When you’re browsing sites like Netflix or Amazon, you get recommendations on what to watch or buy next. This is actually a big deal, and companies offer huge prizes to make these algorithms better.

One way to make these predictions is through the use of recommender systems. There are two types of recommender systems:

  • Collaborative filtering: These systems check for users with similar preferences. For instance, let’s say Bob and Sue have similar movie ratings on Netflix. The system will assume that if Bob likes a given movie, Sue will also like it. Even if she hasn’t rated it yet.
  • Content-based: These systems take into account the characteristics of items being recommended. So, if Bob gives high ratings to action movies, the system will recommend more action movies.

The Utility Matrix of Recommender Systems

Content-based and collaborative filtering both make use of a utility matrix. In a utility matrix, each value represents the user’s “degree of preference” for an item. In the case of Netflix, users can rate movies with up to 5 starts, so the matrix values would range from 1 to 5.

        Suicide Squad       LoTR       Casablanca     Twilight
-----------------------------------------------------------------
Bob 5 4 1
Sue 1 5 4
Joe 4 1
-----------------------------------------------------------------

The goal of the recommendation system is to fill in the blank spaces.

Collaborative Filtering

Collaborative filtering recommender systems work by finding users that are similar to each other. So, how do we do this for Joe, for example?

One way is to use the cosine similarity of the normalized ratings between Joe and others. Sounds complicated, but the idea is simple. If you imagine user ratings as a vector, similar users have vectors that point in the same direction. Here’s a bit of math to blind you with science:

This equation yields values between -1 (very different ratings) and 1 (very similar ratings).

If we apply this equation to Joe and Bob, we get a value of 0.4 (with normalized vectors), and if we apply it to Joe and Sue, we get the value -0.16. This means Joe is much closer to Bob than he is to Sue, so recommending Lord of the Rings is a better idea than recommending Casablanca.

Content-Based

In content-based recommender systems, the idea is to match the preferences of the users with the type of item. The type of item can be set manually (such as movie genre) or automatically (such as getting keywords from the movie descriptions).

Here’s an example of item and user profile based on IMDB genre classification.

              Action  Adventure  Fantasy   Drama   Romance   War
----------------------------------------------------------------
S. Squad 1 1 1
LoTR 1 1 1
Casablanca 1 1 1
Twilight 1 1 1
----------------------------------------------------------------
Bob 0.6 0.8 0.8 -0.5 -0.8 -0.8

The numbers on the matrix for Bob are built by using the normalized ratings from the previous matrix and averaging by the number of ratings. Let’s use fantasy as an example. The average rating for Bob is 3.3 and the number of ratings is 3. If we apply another fancy equation, we get 0.8 as Bob’s “preference” for fantasy films.

Another complicated equation, just to figure out how much Bob loves fantasy films.

Next, we want to decide if we should recommend Twilight to Bob. Using the same similarity equation as before, the result is -0.16. So Twilight is not a good recommendation for Bob!

Conclusion

When you are able to recommend the right movie, product, or article to your users, you have a huge impact on the overall user experience. Not to mention your bottom line.

This post was just an introduction to recommender systems, but I hope it gives you an idea of how they work and what they’re capable of. Post in the comments if you have any question or suggestions!

And don’t forget to click the little heart on the corner… you’ll be helping Medium’s recommender system. :)

--

--

Rodrigo Sousa Coutinho
OutSystems Engineering

Hi! I’m co-founder and Director of Data Science at OutSystems, with a passion for data, great products, and geeky stuff!