Recommender Systems — Ranking and Classifications

4 min readOct 29, 2017

Since May I ‘m attending a Big Data certification course. It is organized by units, and I am currently in the unit of recommender systems, dictated by Denis Parra.

I’m starting these series of posts trying to summarize what I have learned and share my perspective on it.

In this post we will talk about :

Why are we interested on Recommender Systems
How to sort and rank items
Recommendation Systems Classifications
Brief parenthesis: How to measure prediction error

Without further ado, let’s begin.

Why are we interested on Recommender Systems

In two words: Information Overload. As humans it is natural for us to filter with some criteria of importance all the different inputs we get. But we have a limit on how much we can process at a time, and to avoid being overwhelmed we need some strategies to reduce the complexity of what we are perceiving. That is why it is so important, especially on this new era of Big Data, to have systems with heuristic techniques that make our process of selection easier. RecSys specialises particularly on the problem of selecting items from a crowded catalog of options, items that are rated by users.

“Recommender Systems aim to help a user or a group of users
to select items from a crowded item or information space. ”
(MacNee et. al 2006)

Now, we will go into more detail of one of the basic aspects of Recommender Systems: ratings and how to sort them.

How to sort and rank items

Suppose we have an app with users that rate items. Think of a site like Netflix or Amazon. It is clear that we need to compare all these items to recommend one. How do we do it? We need some kind of score.

Evan Miller proposed on his blog three possible solutions of how to score based on the items ratings. Two wrong approaches and one correct solution:

Score = (Positive ratings) - (Negative ratings)

Why is it wrong? Because this algorithm doesn’t consider the total of ratings. It isn’t normalized. Here is an example from Evan Miller’s post:

“Suppose one item has 600 positive ratings and 400 negative ratings: 60% positive. Suppose item two has 5,500 positive ratings and 4,500 negative ratings: 55% positive. This algorithm puts item two (score = 1000, but only 55% positive) above item one (score = 200, and 60% positive).”

2. Score = Average rating = (Positive ratings) / (Total ratings)

Why is it wrong? It doesn’t work well when we have a small number of observations. Miller gives us a clear example:

“Average rating works fine if you always have a ton of ratings, but suppose item 1 has 2 positive ratings and 0 negative ratings. Suppose item 2 has 100 positive ratings and 1 negative rating. This algorithm puts item two (tons of positive ratings) below item one (very few positive ratings).”

3. Score = Lower bound of Wilson score confidence interval for a Bernoulli parameter

I know right, we got fancy all of the sudden, it is a matter of looking at the formula for the score. But I think that the most important thing to understand from this is that a good score needs to consider the number of positive ratings, normalized by the total of ratings. Additionally, the score must consider how representative those values are compared to the distribution of the different items, because an item with a small number of ratings can’t have the same score as one with a lot of ratings. It needs a weight.

On Miller’s words:

“We need to balance the proportion of positive ratings with the uncertainty of a small number of observations. Fortunately, the math for this was worked out in 1927 by Edwin B. Wilson. What we want to ask is: Given the ratings I have, there is a 95% chance that the “real” fraction of positive ratings is at least what? Wilson gives the answer. Considering only positive and negative ratings (i.e. not a 5-star scale), the lower bound on the proportion of positive ratings is given by:

“(Use minus where it says plus/minus to calculate the lower bound.) Here p̂ is the observed fraction of positive ratings, z α/2 is the (1-α/2) quantile of the standard normal distribution, and n is the total number of ratings.”

After this analysis of how to compare items, we can get deeper on the different Recommender Systems that exist.

Recommendation Systems Classifications

Before we get more specific, we need to understand the different classifications RecSys can have.

Considering the used Data:

Rule-based (Decision Tree)
Content-based
Collaborative Filtering

Considering the Model:

Memory-based (KNN) (no Model is trained, but scales badly)
Model-based (SVC, Logistic Regression) (it is expensive to train the Model, but gives quick recommendations)

We will get into more detail of each one on future posts.

To recap

We saw why we need Recommender Systems, and we saw the pitfalls we need to avoid to create a good rating score. We also talked about the different classifications that RecSys can have.

On the next post I will continue with this series, entering in more detail of the User-Based Collaborative Filtering.

Recommender Systems — Ranking and Classifications

Why are we interested on Recommender Systems

How to sort and rank items

Recommendation Systems Classifications

To recap

Written by Carlos Pinela