The importance of filtering in Recommender Systems

Why filtering recommendations matter and what to filter

Recommender Systems, a quick intro

Recommender Systems are by far one of the most successful applications of Big Data / Machine Learning.

Recommender Systems are an integral part of the success of Amazon, bringing more than 30% of revenues, and Netflix, where 75% of what people watch is from some sort of recommendation.

The goal of Recommender Systems is to find what is likely to be of interest to the user, thus enabling personalization and tailored services.

Most Recommender Systems look at user behaviour (the input data, ex. what the user purchased or viewed in the past) and through different techniques (the algorithms) find a list of relevant items for the user (the output).

Given the impact to the bottom line, there has been significant research on Recommender Systems algorithms (ex. collaborative filtering, content based, matrix factorization, etc.), boosted by the 1 Million $ Netflix Prize competition. Since 2007, there is even an ACM conference fully dedicated to the topic.

The importance of filtering recommendations

While most of the focus of research has been on algorithms, there are aspects that are extremely significant when implementing a recommender system which have had less consideration. One of the aspects is the UI and UX of recommender systems, the other is the filtering of recommendations.

Filtering is the act of removing items from recommendations to increase the relevance for the users.

Defining what to filter further among items recommended by the algorithms is often a business decision and may vary with companies and businesses.

What to filter

Photo credit: Jason Long

Technically, everything that is included in the input data may end up being recommended.

To increase relevance, filtering should be applied to the output of recommender systems algorithms, before presenting recommendations to the user.

Here is a list of what to filter:

  • What is not an item. Items in input data that are not “real” items should be filtered. For example, coupons may be among customer transactions, nevertheless coupons should not be part of recommendations
  • Items phased out. In the input data there may be items that are not sold anymore. A special case is when an item is substituted by a newer version. If you are able to connect the old and new versions you may include in the recommendations the new version instead of just filtering out the old one
  • Items out of stock. Similar to items phased out, but in this case they are only temporarily not available. It may be a business decision whether to give more exposure to items which can be immediately sold
  • Seasonal items. Even if a user may be interested in an air conditioner, it may be unwise to recommend it while outside is snowing
  • Items less attractive from a business perspective. For example, items that have low or negative sales margin. On the opposite, it may make sense to promote items with higher margins or that are relevant to business strategy
  • Items that can offend the user. Look out for sensitive categories such as religion, health, sex, politics, etc. Consider also that the same computer may be shared by different users (ex. a parent and a child)
  • Items already purchased. Depending on the business, there may be items that most likely are not be purchased more than once (ex. a book). This is usually more common in B2C than B2B. Purchased items have more than one grey zone. For example, consumables may be considered purchased just for a limited time period. Returned items could also be considered as purchased, but again it may vary. Different configurations of the same item could be taken into consideration as well (ex. if a user bought a red dress, the same dress in yellow could be filtered)
  • Items in the cart. Items in the cart can be treated as items already purchased
  • Items with significant higher value. If I am shopping for a phone, I might also need a case, while recommending a phone while I am looking at a case may make less sense
  • Items with quality issues. Items with low ratings, bad reviews, return issues, etc. Selling them may bring more trouble than not, so better not recommend them
  • Items of the same or different category. If your goal is cross selling, it makes sense to recommend only items of categories different (ex. accessories) from what the user is looking at. While if you want to help the user in choosing an item (ex. helping choose which TV to buy), it makes sense to filter items of other categories and recommend only alternatives of the same category

How to choose what to filter

Photo credit: Todd Quackenbush

Filtering the right items can make your recommendations much more useful and improve user experience.

As already stated, the main driver in choosing what to filter is business decisions, coupled with some common sense.

Still, in many cases what is best is not so evident upfront. Doing A/B tests may bring some good evidence-based decisions and give a boost to results.

There may also be constraints on the technical side regarding what can be filtered. Filtering in real-time may be expensive, making recommendations slow and unresponsive. One solution may be filtering upfront, for example removing items from input data.

The end goal is always to give the user the best possible experience.

The list of items to filter is based on my experience. Did you encounter something else? Do you treat some items differently?