Deep Learning for Recommender Systems: Next basket prediction and sequential product recommendation
Accurate “next basket prediction” will be enabling next generation e-commerce — predictive shopping and logistics. In this blogpost, we will discuss the deep learning technology behind next basket prediction and forecast the impact on online retailers and regular customers.
E-commerce is huge and online sales are predicted to double in the following three years. Covid19 pandemic accelerated this trend and now even groceries are going online. Many experts consider this process to be irreversible. Groceries particularly have many repetitive purchases occuring that generate patterns of buyer behaviour in time. These patterns can be discovered by machine learning models and subsequently be used to predict from the purchase history.
What is the next basket prediction?
Predictive shopping or next basket prediction has been around for quite some time. In 2013, experts forecasted that machine learning will enable predicting next purchase in a way that will change e-commerce forever in the next year. This forecast did not materialize and even 7 years later, machine learning models are not good enough to make next basket prediction which can be useful enough.
There are some estimates that even 60% accuracy of prediction can enable predictive shopping and Amazon is already experimenting with so-called anticipatory shipping.
However not every domain is good for experiments with predictive delivery. For example fresh food cannot be returned in case of wrong prediction and shipped again to another customer. Also some products that you normally buy only once are hard to guess for the predictive model based on your purchase history alone. There have been attempts to enrich data e.g. from social networks, but we believe such information should not be used at all due to the privacy concerns and user protection rules. Even when users agree and actively collaborate, signals from social networks are unstructured and far from credible. The fraction of users that actively share such personal data with retailers is also generally too small.
Predictive models versus recommender systems
Retailers should utilize shopping history and web browsing behaviour to the full extent when it comes to next basket prediction. There are different machine learning approaches that can be utilized for this task.
Regular predictive models
Standard predictive models or classifiers are typically trained to estimate a single output variable. In this case, one will predict if a particular item (product) will be present or not in the next basket. The input of the model has to encode the shopping history of a particular customer or selected group of similar customers. There are many possible algorithms that can be used as predictors including recurrent neural networks, xgboost or convnets.
The advantage of this approach is that you can score all customers for a particular product and select just a few with the highest probability of purchase.
Disadvantage is that you need to construct a model for each product, which is not practical for retailers with hundreds thousands of products in the inventory. One trick is predicting a product embedding instead of purchase probability. This trick is crossing the blurry border to the realm of recommender systems.
In contrast to standard predictive models, recommender systems are designed to learn from large matrices of user interactions with items. In many recommendation scenarios it is not important to take into account in which order the user interacts with items. However when it comes to next basket prediction, sequence based learning is of crucial importance.
So-called session based recommender systems have been implemented using markov chains, tensor factorization or factorization machines in the past. These algorithms were quite limited in their performance or scalability. Recently, deep learning based models started to take the lead.
Last interesting “old-fashioned” algorithm with fitting name Fossil, is a combination of similarity based methods and (high-order) Markov Chains.
The same year, GRU4Rec was published and we have already mentioned this approach in our previous deep recommender blog post. Recurrent Neural Network (RNN) based autoencoder architecture of GRU4Rec can be considered as “deep learning approach” for sequence based recommendation.
Instead of RNN, Caser uses two parallel CNN blocks to capture user’s preferences and sequential patterns at both union-level and point-level.
A Simple Convolutional Generative Network for Next Item Recommendation (NextItNet) introduces CNN architecture that is better suited for next item prediction than Caser.
Most recent direction of research is accompanying neural models with attention, the trend we can observe in language modelling.
It started with Neural Attentive Recommendation Machine (NARM) which is a recurrent autoencoder equipped by attention and Short-Term Attention/Memory Priority Model (STAMP) that uses an external memory and attention. Also, self-attention replaces RNNs in many recent papers, because it is more scalable. Similar trend as in natural language processing. Next Item Recommendation with Self-Attention (AttRec) uses self-attention to model short term user intent and relationships among items.
Similarly Self-Attentive Sequential Recommendation (SASRec) shows that it can model long sequences more accurately and faster than older RNN/CNN based models.
Bi-directional transformers (BERT4Rec) can further improve the performance, however model complexity is quite high. Alibaba reports further improvement by self-supervision in disentangled latent space. Adding external memory as in Memory Augmented Transformer Networks (MATN) also seems to be a promising approach.
Some of the recent sequential models introduced by Spotify research combine contextual information (such as time in day or device used) and sequential recurrent neural network with attention.
Another interesting idea is to model time dependencies by parallel architectures.
M3R model was introduced in Towards Neural Mixture Recommender for Long Range Dependent User Sequences by Google for sequential recommendation with long user sequences and tested in the production environment.
As you see, in the last few years the research of sequential recommenders almost exploded. Major labs and companies are making fast progress in improving performance of sequential recommenders and next item prediction in general. Those who can access large datasets of interaction history and possess massive computational resources have potential to achieve a similar breakthrough as we have been witnessing in the NLP domain since 2019.
Why is prediction performance super important? In e-commerce, potential revenue from successful recommendations is much higher than in media or any other domain. More precise next item prediction is bringing retailers much higher revenue, justifying increased cost of recommendations.
Recombee deep learning recommendations in predictive shopping
First of all, pretrained deep convolutional neural networks can be used to generate visual item embedding from catalog images. Often more images of a single product are available and it is quite challenging to utilize them in a way that images of the same type (the same component of a product) are aligned.
Also, product text descriptions can be incorporated by using neural text embeddings. Together, you can project all attributes into a single product embedding.
The performance of next basket prediction is strongly dependent on quality of the data. The better the user profiles, more repetitive purchases, less products, more quality item attributes, the better chance for higher accuracy of the prediction.
In Recombee, we have a chance to collaborate with bofrost*, largest distributor of frozen food in Europe.
Hundreds of millions of anonymized historical purchases (baskets) are perfect for research in next basket prediction.
Once we have quality product embeddings, we can encode them into basket embeddings.
We use special variational autoencoders to regularize latent basket embeddings. This is important especially when you need to be able to generalize and include new coldstart products.
One of the possible deep neural architectures for next basket prediction uses recurrent networks to predict embeddings of the next basket. There are of course many more architectures (as explained above) that are being researched and tested in production.