Recommender Systems with AWS Sagemaker
The systems that show items or products to users according to their taste is known as recommender systems. It is also known as Personalization as the name implies. It allows users to see relevant items and increase customer experience as well as revenue. This text makes an introduction to the background of recommender systems and explains how it is possible to leverage AWS Sagemaker to develop and deploy one.
From the developer’s perspective, creating such a system can be formulated in several forms. The most well known example is movies. Assume that people go to a movie website and vote on the movies they watch, that is they give a rank between 0 and 10. Naturally this can be formulated as a regression problem. In this case, users provided what they liked and not. This is called explicit feedback and it provides good insight for the business. It can be shown by a User product matrix like below.
In this table, rows are users and products (such as movies) are columns. The values in the matrix show the ratings users gave to products. In case a user did not rate a movie, then that cell is blank.
In most of the cases, explicit feedback is not available as most of the users would not rate the items. In this case the developer has to assume that if a user interacted or purchased an item, then it should be a positive feedback. This is called implicit feedback. The user product matrix would like below:
In this case, the target is binary, therefore it can be formulated as a classification problem. Since a user interacts with several items, it is best to formulate it as a multi-label classification problem.
Having a problem formulation at hand, there are several algorithms to solve the problem for both regression and classification. Conventionally, two famous methods come forward: content-based filtering and collaborative filtering.
Content-based filtering relies on existing product features to learn user features. This method is basically fitting a linear regression model for each user based on the user’s past purchases.
Then the user product matrix can be extended to include and demonstrate these features.
On the top right, highlighted with yellow marker, existing product features are seen. Depending on the context, these can be product attributes such as price, color, size or engineered features such as tf-idf in case of text. On the left, highlighted with blue marker, user vectors are seen. ϴij indicates feature j of user i. It is now more clear why content-based filtering algorithm fits a separate regression model for each user, that is because each user has a different set of parameters to be learned. The task to perform is to obtain the user-product matrix highlighted with green marker, when user matrix and product matrix are multiplied (dot-product). Regression algorithm is fit to minimize the difference between the real user-product matrix and the multiplication result.
This method’s most prominent upside is it does not care about the cold-start issue, hence it is able to recommend new items to users with no issue.
It has several downsides. First, it requires domain knowledge and manually created product features. Moreover, it is not suitable for recommending items beyond the user’s taste, as it does not consider what other users are doing.
Collaborative filtering on the other hand, learns both user and product features from scratch, therefore it resolves the most important downside of content-based filtering. It does so by using a linear algebraic algorithm called Singular Value Decomposition (SVD). Given a user-product matrix with shape (N x M), SVD decomposes this into two lower dimensional matrix whose shapes are (N x d) and (d x M) when multiplied, result in the user-product matrix. Then the new formulation of user, product and user-product matrices are as below:
This time both user features ϴ and product features x are learned together to construct the user-product matrix when multiplied. This scheme is known as matrix factorization. Alternatively, this problem can be solved using gradient descent.
As the name suggests, collaborative filtering learns from other users’ interactions with products as well and consequently it is able to recommend items to a user based on other similar users. This allows better exploration of items.
It has a few downsides as well. First one is the cold-start problem, that is it cannot recommend new items until they are purchased by some users. Secondly, it is great that it learns product features itself, but it cannot facilitate existing product features such as color, size, price, etc.
Inference:
Both content-based filtering or collaborative filtering result in user and product vectors. Depending on the specific use case, new recommendations can be made in several ways. They are:
- Dot product between user and product vector
- User-product similarity
- User-user similarity
- Product-product similarity
The later three require a measure of vector similarity. Cosine similarity, Jaccard Similarity and Pearson Correlation are some of the commonly used measures of vector similarity. Then the most similar vectors are used for recommending new products.
User-product similarity: In this case, the vectors to compare come from the user-product matrix. The idea is to recommend the products to the user from the vectors of other users whose user-product vectors are most similar to. The group of vectors to be compared are shown with red below:
This is also known as memory-based method and comes with advanced variants such as k-nearest neighbor.
User-user similarity: In this case, the vectors to compare come from the user matrix. The idea is to recommend the products to the user purchased by similar users. The vectors to be compared are shown with red below:
Product-product similarity: In this case, the vectors to compare come from the product matrix. The idea is to recommend the products to the user similar to the products purchased by the user in the past. The group of vectors to be compared are shown with red below:
Deep Learning based recommenders: Content-based and Collaborative Filtering were the conventional methods that companies leveraged until the last decade. With Deep Learning becoming the state-of-the-art in almost any task, the field of recommender systems has also been changed.
State-of-the-art recommender systems leverage deep learning to recommend items to users. Again, it is possible to formulate the task as both regression and classification. Benefit of deep learning based recommenders is they are able to facilitate the upsides of both content-based and collaborative filtering algorithms, namely, leveraging existing product features while learning product and user embeddings from scratch. Moreover, transformations obtained by several layers allow richer representations and better recommendations. After all, this is the strength of deep learning.
AWS Sagemaker: Although AWS has a recommendation product called Personalize, it is possible to develop and deploy these methods on cloud using AWS Sagemaker. It comes with the benefit of extreme customization and know-how of the systems. AWS has a documentation that explains how to develop Neural Collaborative Filtering Network using Tensorflow as the computation graph below shows.
Moreover they have great documentation regarding how to train and deploy custom Tensorflow models on AWS, using Sagemaker.