The Road to Recommender Systems
In this series of articles, I’m going to share the edifying experience of mastering Recommender Systems and the implementations deriving from basic to advanced models around recommendation systems.
I will also briefly refer to the Recommender-system-related challenges you will be coming across when building such a system, and what I did to overcome them.
Lastly, I will walk you through the ideas, unsuccessful attempts, and the validation framework I used to track the performance of the model and the other machine-learning models I constructed, in order to reach the successful formula.
What is a Recommender system?
Recommender systems are arguably the most common application of big data in order to improve individual customer experiences, by suggesting content for your website.
Recommender systems are part of information filtering systems and Artificial Intelligence which aim to predict the preferences that someone will have over a plethora of choices.
Where do we use Recommender systems?
The most common area of use is product recommendation field filtering and learning through customers’ preferences, in order to apply this knowledge to others.
Enjoying life as I was, I decided I needed to shake things up a little.
Take on a challenge, you know.
Roam the worlds of data science and conquer AI battles.
You know, the type of thing you regret when it’s already too late.
Boy, did I take the plunge, and dove into Recommendation Systems, fought the sea monsters and here I am. Sinbad the sailor, if you will, survived to tell the story.
It must have been about a year or so ago when the Email Marketing and Automation company I work for (Moosend) assigned me to a brand new project.
The concept was to create a data-driven recommendation engine that can fit and work for every e-commerce shop. This general-purpose system has to produce personalized product recommendations from their own customer product interactions (which we already have).
The challenge about this was that it had to be entirely dynamic and able to adapt to various patterns, namely to seasonal purchase patterns (e.g. during gift-giving periods such as Christmas, Hanukkah, Easter etc. ), but also to revenue maximization.
Understanding Recommendation Engines
Every time I get a new project the first thing I do is try to understand the basics; when and what it will be used for, the structure of a system, the variety, and the scalability that it could have.
There are different types of Recommender Systems (RecSys) and what you choose depends on the strategy you want to follow to approach your customers.
Now, never before has the world been driven by data. But now, most of the biggest e-commerce sites rely on data-driven decision systems to amplify their sales.
And personalized product recommendations are AI’s gift to the word of eCommerce, since they can help you increase your click-through rate (CTR) and sales rate.
Like I said, there are 5 different types of AI recommendation systems:
Content-Based recommender systems identify similar products based on their attributes, i.e. the characteristics of each product.
We represent every product with its attributes (for example, the attributes of a mobile phone are Screen size, Price, Camera, Software, etc.) and we try to find the most similar ones.
In this way, we suggest similar “mobile phones” to someone with an interest for a mobile with specific characteristics.
Collaborative Filtering engines identify the preferences of similar customers and are based on the idea that people with similar behavior share similar interests.
In these systems, we represent every customer with their interactions and this way, we predict the probability of their interest for every product, that is, the likelihood of the customer actually appreciating the product recommended to them.
As a result, we can approach a new customer by recommending products for the most similar ones.
Hybrid Systems are the combination of Content-Based and Collaborative Filtering recommender systems.
Score every given product from both models and weigh each result; the final recommendations will result from the linear combination of the 2 scores.
Association Rules or Market Basket Analysis engines are slightly different from the previous ones.
Having a large data set of interaction data we can find patterns for items frequently purchased together as a sequence; for example if someone has added coffee to his cart but hasn’t added sugar, we recommend sugar.
Repeat Purchase engines predict the specific time or approximate time by which a customer will have purchased a specific product.
This algorithm uses the product duration, purchase history and day statistics in order to predict the future date.
For example, if someone buys monthly contact lenses, we recommend the same product every 30 days in case they have forgotten to purchase them. This way, we encourage them to continue shopping with our store.
To refresh your memory, this type of recommendation engine tries to identify users with similar interests by using their product interactions (purchases, product views and add to cart products).
You can implement Collaborative Filtering for recommender systems with one of two methods; the memory-based one or the model-based one. In both methods, we represent customers with their interactions like vectors formatting a matrix.
In the memory-based one, you measure the distance between all the vectors (customers) with each other and recommend products from the most similar.
In the model approach, widely known as matrix factorization model for recommender systems, we recognize latent factors in data.
In the world of statistics, latent factors are not variables that we observe or measure with a direct way, but a set of variables which explain (describe) other variables and their relationship in a lower dimensional space without losing the information.
In our case, latent factors find and decode patterns for every customer in order to identify the similarities between them.
Recommender System Model #1 (My first shot)
The first model I came up with was a standard matrix-factorization model.
In this particular one, we represent a customer by their product interactions in a 2D sparse matrix named R ; sparse matrix is a well-known as computation-efficient and memory-efficient way to store a large amount of data all together and ready to process.
The rows of the matrix represent your customers and columns represent products like vectors, then we fill the customer-product interaction cell with the value of 1.
As you can imagine, the cell with the products that the customer did not interact with remains empty:
The next step of the process is to split the R matrix into two, one for the customers (P) and one for the products (Q) with their latent factors. Then, we tune both matrices with a lambda function and measure the error rate from the values of the original R matrix. When the error rate drops from the given threshold, we interrupt the process.
To format the R-hat matrix we calculate the dot product of P and Q. In linear algebra, dot product is the result of the matrix multiplication.
The last step is to recommend a group of products sorted by the highest purchase probability of the specific customer.
In order to monitor the performance of our model, we have to measure the quality of the recommendations that the model produces.
In recommender systems, we measure the performance of an engine with Precision@k and Recall@k, which are widely used in information retrieval scenarios.
Precision is defined as the Number of recommended items that the customer has interacted with (:viewed, added to cart, etc.), divided by the number of items in the recommendation set k.
Recall is the number of recommended items in recommendation set k that the customer has interacted with, divided by the universal number of items that the customer has interacted with, even outside the recommendation set.
Also, we used one more measurement for our system called Accuracy score in order to measure the overall performance. We define the accuracy score as the sum of recommendation sets with which a customer has interacted (minimum 1 interaction/set), divided by the number of total customer recommendations.
In all models, we measure the performance of the models in the top 5 (k=5) recommended products.
The Good, the Bad and the Ugly
The cons of the model outnumber the pros.
The good thing about implementation is that the model and process are quite straightforward and simple for someone who can understand the basics and has some experience in the field. Also, the implementation of this model allows us to have all the information into a single “trained” matrix, which is ready for recommendations in production.
Now, about the cons. When new shops join the recommendation engine, given the matrix sparsity, there is exponential growth in computations and time spent on them.
As a result, after dozens of sites, the system will have consumed a lot of memory and will take days to tune and work properly.
The most important thing when producing personalized product recommendations is data; as a case in point, a large number of medium-small shops do not have enough interaction data to produce personalized recommendations of their own.
Observing the disappointing results and working over the purpose of the system, I decided to change my course and focus on how I process and distribute the information.
Thus, I can help smaller, data-weak shops, and decrease the size of the interaction matrix.
In the next article, I will walk you through the way that we can automatically merge the information of products and interactions between different shops.