Recommender systems explained

In this article, I overview broad area of recommender systems, explain how individual algorithms work.

I will start with a definition. A recommender system is a technology that is deployed in the environment where items (products, movies, events, articles) are to be recommended to users (customers, visitors, app users, readers) or the opposite. Typically, there are many items and many users present in the environment making the problem hard and expensive to solve. Imagine a shop. Good merchant knows personal preferences of customers. Her/His high quality recommendations make customers satisfied and increase profits. In case of online marketing and shopping, personal recommendations can be generated by an artificial merchant: the recommender system.

To build a recommender system, you need a dataset of items and users and ideally also interactions of users with items. There are many application domains — typically, users are customers, items products and interactions are individual purchases. In this Figure, users are card holders, items are card terminals and interactions are transactions. From such dataset, rules can be generated showing how users interact with items. In this case, rules based on card transactions in the Czech Republic can be used to recommend shops to visit.

Knowledge based recommender systems

Both users and items have attributes. The more you know about your users and items, the better results can be expected. Below, I give an example of item attributes relevant for recommendation:

Item: TV2305 {  
"name": "Television TV2305",
"short description": "HD resolution LED TV",
"long description": " Enjoy a movie with your family on the weekend with this HD television from Micromax. With an 81cm (32) display, you can view every single detail with rich detail and clarity. This LED TV produces a resolution of 1366 x 768 pixels with a refresh rate of 60Hz to display crisper images and fluid picture movement. Play HD videos and enjoy a 178 degree viewing angle so that everyone in the family, even those at the sides, can see. Connect HD devices such as BluRay players, PlayStations or HD Set Top Boxes to this television as it has an HDMI port. You can also connect an HDD or USB device to this TV via its USB port. Get a surround sound effect in your living room as this TV comes with two 8W speakers to deliver crisp sound across all your media. With a 5 band equalizer and an auto volume leveler feature, you can enjoy a movie's soundtrack or the latest hit single the way it was meant to be heard.",
"price": 250,
"categories": ["Electronics", "Televisions"]}

Such attributes are very useful and data mining methods can be used to extract knowledge in forms of rules and patterns that are subsequently used for recommendation. For example, the item above is represented by several attributes that can be used to measure similarity of items. Even the long text description can be processed by advanced NLP tools. Then, recommendations are generated based on item similarity. When users are also described by similar attributes (e.g. text extracted from CVs of job applicants), you can recommend items based on user-item attributes similarities. Note that in this case we do not use past user interactions at all. This approach is therefore very efficient for so called “cold start” users and items. Those are typically new users and new items.

Content based recommender systems

Such systems are recommending items similar to those a given user has liked in the past, regardless of the preferences of other users. Basically, there are two different types of feedback.

Explicit feedback is intentionally provided by users in form of clicking the “like”/”dislike” buttons, rating an item by number of stars, etc. In many cases, it is hard to obtain explicit feedback data, simply because the users are not willing to provide it. Instead of clicking “dislike” for an item which the user does not consider interesting, he/she will rather leave the web page or switch to another TV channel.

Implicit feedback data, such as “user viewed an item”, “user finished reading the article” or “user ordered a product”, however, are often much easier to collect and can also help us to compute good recommendations. Various types of implicit feedback may include:

Interactions (implicit feedback):
- user viewed an item
- user viewed item's details
- user added an item to cart
- user purchased an item
- user have read an article up to the end

Again, you can expect better performance of recommender system, when the feedback is rich.

Content based recommenders work solely with the past interactions of a given user and do not take other users into consideration. The prevailing approach is to compute attribute similarity of recent items and recommend similar items. Here I need to point out one interesting observation from our business. Recommending recent items themselves is often very successful strategy, which of course works just in certain domains and for certain positions.

Collaborative filtering

Last group of recommendation algorithms is based on past interactions of the whole user-base. These algorithms are far more accurate than the algorithms described in previous sections, when a “neighborhood” is well defined and the interactions data are clean.

Very simple and popular is a neighborhood-based algorithm (K-NN) described above. To construct a recommendation for a user, k-nearest neighbor users (with most similar ranked items) are examined. Then, top N extra items (non-overlapping with items ranked by the user) are recommended.

This approach works perfectly fine not only for mainstream users and popular items, but also for “long-tail” users. By controlling how many neighbors are taken into the consideration for recommendation, one can optimize the algorithm and find a balance between recommending bestsellers and niche items. Good balance is crucial for performance of the system, which will be discussed in the second part of this article.

There are two major variants of neighborhood algorithms. The item-based and user-based collaborative filtering. Both algorithms operate on a matrix of user-item ratings.

In the user-based approach, for user u, a score for an unrated item is produced by combining the ratings of users similar to u.

In the item-based approach a rating (u,i) is produced by looking at the set of items similar to i (interaction similarity), then the ratings by u of similar items are combined into a predicted rating.

The advantage of the item-based approach is that item similarity is more stable and can be efficiently pre-computed.

From our experience, user-based algorithms outperform item-based algorithms in most of the scenarios and databases. The only exception can be databases with significantly lower number of items than users and low number of interactions.

K-nearest neighbor algorithm is not only solution to collaborative filtering problem. The rule-based algorithm above uses APRIORI to generate set of rules from the interaction matrix. The rules with a sufficient support are subsequently used to generate candidate items for recommendations.

Important difference between K-NN and rule based algorithms is the speed of learning and recall. Machine learning models operate in two phases. In the learning phase, model is constructed and in the recall phase, model is applied to new data. Rule based algorithms are expensive to train but their recall is fast. K-NN algorithms are just the opposite — therefore they are also called lazy learners. In recommender systems, it is important to update model frequently (after each user interaction), to be able to generate new recommendations instantly. Whereas lazy learners are easy to update, rule based models have to be retrained, which is particularly challenging in large production environments.

In Recombee, we designed a lazy variant of rule based recommendation allowing us to mine rules on the fly and update the model instantly with incoming interactions.

Rules generated from interactions of users with items.

Rules can be visualized and it is great tool to inspect quality of data and problems in your database. The Figure shows rules with a sufficient support in the interaction matrix. Each arrow is a rule (or implication) that there are enough users that interacted with the source item and subsequently with the target item. Strength of the connection is the confidence.

Detail view of the rules above. Each rule is represented by an arrow. Size of arrow is the confidence of the rule.

These particular rules were generated from card transaction data provided by a bank. Items are “card terminals” and users are “card holders”. Interactions are individual transactions. We omit labels because data are confidential and a lot can be derived from rules. In the first image, clusters of rules are apparent. These are apparently card terminals close by their geographical location. There are interesting rules showing shopping habits of users. When you upload your data to our recommender (directly or using our Keboola app, that is even faster), we can generate such rules for you and you can inspect interesting rules in your own data.

But the primary purpose of the above rules is not data analytics but recommendations. One can generate recommendations for individual card holders based on their recent transactions (e.g. people who withdraw money from this ATM typically spend them in following shops). A bank can build smart data products on top of such recommendations (e.g. offering bonus for shopping in recommended places). Such data products can be generated almost everywhere. Do you have an idea, how data recommendation-powered data products can improve your business? Let us know and we can help you to validate it.

.

The last, and probably most interesting class of collaborative filtering algorithms described here are the factorization based approaches. Above, matrix of interactions is factorized into two small matrices one for users and one for items with certain number of latent components (typically several hundreds). The (u,i) rating is obtained by multiplying of these two small matrices. There are several approaches how to decompose matrices and train them. The one showed above is a simple gradient descent technique. The error can be minimized by Stochastic Gradient Descent, Alternating Least Squares or Coordinate Descent Algorithm. There are also SVD based approaches, where the ranking matrix is decomposed into three matrices.

Generally, this is very interesting and mature field of machine learning. Here is some further reading, if you are interested: Facebook approach to scalable recommendation based on matrix factorization, dealing with implicit ratings or various metrics.

As you can see, there are plenty of algorithms and each algorithm has parameters that help us to find good plasticity of models. One of my further posts will discuss ensembles of recommendation models that can further improve quality of recommendations.

How to measure the quality of algorithms? This is another complex issue.

“Bad” recommendations are hard to detect and prevent in general. They are often domain specific and have to be filtered out.

Offline evaluation of recommendation algorithms showing how the ALS based Matrix Factorization can outperform user based K-NN. Details in the next post.

Our next post will be about evaluation of recommender systems.

Online evaluation of quality and optimization towards better performing recommendations. Details in the next post.

There are several strategies how to evaluate recommenders both offline and online. In Recombee accurate quality estimates help us to optimize parameters of the system automatically and improve the performance of the recommender for all scenarios.

You can find out which combination of algorithms is most efficient for your data. We prepared a free instant account for such purposes so you can experiment using our API or clients.

Here is how to start and build your recommender system in hours.