Deep Learning for Recommender Systems: Next basket prediction and sequential product recommendation

Pavel Kordík
Recombee blog
Published in
7 min readNov 10, 2020


Accurate “next basket prediction” will be enabling next generation e-commerce — predictive shopping and logistics. In this blogpost, we will discuss the deep learning technology behind next basket prediction and forecast the impact on online retailers and regular customers.

E-commerce is huge and online sales are predicted to double in the following three years. Covid19 pandemic accelerated this trend and now even groceries are going online. Many experts consider this process to be irreversible. Groceries particularly have many repetitive purchases occuring that generate patterns of buyer behaviour in time. These patterns can be discovered by machine learning models and subsequently be used to predict from the purchase history.

What is the next basket prediction?

Predictive shopping or next basket prediction has been around for quite some time. In 2013, experts forecasted that machine learning will enable predicting next purchase in a way that will change e-commerce forever in the next year. This forecast did not materialize and even 7 years later, machine learning models are not good enough to make next basket prediction which can be useful enough.

Next basket prediction illustration. Based on previous purchases you have to predict the content of the next basket. It should work for grocery as well as for fashion or any other goods no matter being physical or virtual.

There are some estimates that even 60% accuracy of prediction can enable predictive shopping and Amazon is already experimenting with so-called anticipatory shipping.

However not every domain is good for experiments with predictive delivery. For example fresh food cannot be returned in case of wrong prediction and shipped again to another customer. Also some products that you normally buy only once are hard to guess for the predictive model based on your purchase history alone. There have been attempts to enrich data e.g. from social networks, but we believe such information should not be used at all due to the privacy concerns and user protection rules. Even when users agree and actively collaborate, signals from social networks are unstructured and far from credible. The fraction of users that actively share such personal data with retailers is also generally too small.

Predictive models versus recommender systems

Retailers should utilize shopping history and web browsing behaviour to the full extent when it comes to next basket prediction. There are different machine learning approaches that can be utilized for this task.

Regular predictive models

Standard predictive models or classifiers are typically trained to estimate a single output variable. In this case, one will predict if a particular item (product) will be present or not in the next basket. The input of the model has to encode the shopping history of a particular customer or selected group of similar customers. There are many possible algorithms that can be used as predictors including recurrent neural networks, xgboost or convnets.

The advantage of this approach is that you can score all customers for a particular product and select just a few with the highest probability of purchase.

Disadvantage is that you need to construct a model for each product, which is not practical for retailers with hundreds thousands of products in the inventory. One trick is predicting a product embedding instead of purchase probability. This trick is crossing the blurry border to the realm of recommender systems.

Recommenders systems

In contrast to standard predictive models, recommender systems are designed to learn from large matrices of user interactions with items. In many recommendation scenarios it is not important to take into account in which order the user interacts with items. However when it comes to next basket prediction, sequence based learning is of crucial importance.

So-called session based recommender systems have been implemented using markov chains, tensor factorization or factorization machines in the past. These algorithms were quite limited in their performance or scalability. Recently, deep learning based models started to take the lead.

Factorized Sequential Prediction with Item SImilarity Models (FOSSIL) for each user in given time t computes probability of the interaction with item j given the series of previously interacted items. Parameters are estimated by Bayesian Personalized Ranking.

Last interesting “old-fashioned” algorithm with fitting name Fossil, is a combination of similarity based methods and (high-order) Markov Chains.

The same year, GRU4Rec was published and we have already mentioned this approach in our previous deep recommender blog post. Recurrent Neural Network (RNN) based autoencoder architecture of GRU4Rec can be considered as “deep learning approach” for sequence based recommendation.

Convolutional Sequence Embedding Recommendation (Caser) incorporates the Convolutional Neural Network (CNN) to learn sequential features, and Latent Factor Model (LFM) to learn user specific features.

Instead of RNN, Caser uses two parallel CNN blocks to capture user’s preferences and sequential patterns at both union-level and point-level.

Masked Convolutional Residual Network can identify important position and recurrent signals when modeling long-range sequence data. These signals are often lost when using max pooling or not using skip connections.

A Simple Convolutional Generative Network for Next Item Recommendation (NextItNet) introduces CNN architecture that is better suited for next item prediction than Caser.

Most recent direction of research is accompanying neural models with attention, the trend we can observe in language modelling.

Importance of items in sessions for next item prediction (from NARM paper).

It started with Neural Attentive Recommendation Machine (NARM) which is a recurrent autoencoder equipped by attention and Short-Term Attention/Memory Priority Model (STAMP) that uses an external memory and attention. Also, self-attention replaces RNNs in many recent papers, because it is more scalable. Similar trend as in natural language processing. Next Item Recommendation with Self-Attention (AttRec) uses self-attention to model short term user intent and relationships among items.

Average attention weights show that the model can discover relation among similar content (SASRec).

Similarly Self-Attentive Sequential Recommendation (SASRec) shows that it can model long sequences more accurately and faster than older RNN/CNN based models.

Bi-directional transformers (BERT4Rec) can further improve the performance, however model complexity is quite high. Alibaba reports further improvement by self-supervision in disentangled latent space. Adding external memory as in Memory Augmented Transformer Networks (MATN) also seems to be a promising approach.

Some of the recent sequential models introduced by Spotify research combine contextual information (such as time in day or device used) and sequential recurrent neural network with attention.

CoSeRNN model to predict next song using RNN, attention and context (Spotify research).

Another interesting idea is to model time dependencies by parallel architectures.

M3R model was introduced in Towards Neural Mixture Recommender for Long Range Dependent User Sequences by Google for sequential recommendation with long user sequences and tested in the production environment.

Another interesting direction is to model patterns by graph neural networks such as HyperRec or by a temporal attentive sequential recommendation model (TASER).

Next-item Recommendation with Sequential Hypergraphs (HyperRec).

As you see, in the last few years the research of sequential recommenders almost exploded. Major labs and companies are making fast progress in improving performance of sequential recommenders and next item prediction in general. Those who can access large datasets of interaction history and possess massive computational resources have potential to achieve a similar breakthrough as we have been witnessing in the NLP domain since 2019.

Why is prediction performance super important? In e-commerce, potential revenue from successful recommendations is much higher than in media or any other domain. More precise next item prediction is bringing retailers much higher revenue, justifying increased cost of recommendations.

Recombee deep learning recommendations in predictive shopping

Image embeddings of a product catalog sample

First of all, pretrained deep convolutional neural networks can be used to generate visual item embedding from catalog images. Often more images of a single product are available and it is quite challenging to utilize them in a way that images of the same type (the same component of a product) are aligned.

Also, product text descriptions can be incorporated by using neural text embeddings. Together, you can project all attributes into a single product embedding.

The performance of next basket prediction is strongly dependent on quality of the data. The better the user profiles, more repetitive purchases, less products, more quality item attributes, the better chance for higher accuracy of the prediction.

In Recombee, we have a chance to collaborate with bofrost*, largest distributor of frozen food in Europe.

Hundreds of millions of anonymized historical purchases (baskets) are perfect for research in next basket prediction.

Once we have quality product embeddings, we can encode them into basket embeddings.

We use special variational autoencoders to regularize latent basket embeddings. This is important especially when you need to be able to generalize and include new coldstart products.

One of the possible deep neural architectures for next basket prediction uses recurrent networks to predict embeddings of the next basket. There are of course many more architectures (as explained above) that are being researched and tested in production.

Recently, Recombee partnered with NVidia to accelerate research of sequential recommenders.

Let us know if you would like to work with us on this challenging problem. In the end, we will beat all those simple clever heuristics by a large margin.