Machine Learning for Recommender systems — Part 2 (Deep Recommendation, Sequence Prediction, AutoML and Reinforcement Learning in Recommendation)

In the first part of our talk, we discussed basic algorithms, their evaluation and cold start problem. Below we show how deep learning revolutionises the field of recommender systems.

There are several ways how to utilize deep learning in recommender systems. Neural networks can be trained to predict ratings or interactions based on item and user attributes. You can use deep neural nets to predict next action based on historical actions and content. Also, deep autoencoders can be used for collaborative filtering projecting interaction vectors to a latent space similarly as matrix factorization.

One of the first papers in the field by Google introduced ensemble of deep and wide regression to predict ratings. The assumption is that “wide” linear models can effectively
memorize sparse feature interactions using cross-product feature transformations, while “deep” neural networks can generalize to previously unseen feature interactions through low dimensional embeddings.

The next important direction of the research is predicting next items in sequence of views or purchases. So called sequence based recommenders can be realized by traditional machine learning models, however GRU or LSTM recurrent neural networks seems to be most accurate when it comes to modeling longer term relationships.

Here we share one of session based projects from our laboratory, where recurrent neural nets are used to model activity of users based on their past sessions. Predicting the time of next recommendation is quite accurate for frequent users of a music streaming service.

Autoencoders and deep autoencoders are heavily used to obtain latent representation of item attributes or interactions. In collaborative deep learning item attributes autoencoders are trained together with matrix factorization of interaction matrix to combine both collaborative and content based approaches.

Incorporating images into recommendation

Many of our customers provide high quality images as item attributes. Incorporating images can therefore significantly improve recommendation as many users decide mostly based on visual information and text descriptions can be sometimes not very accurate or misleading. In some recommendation scenarios (e.g. related products in fashion), visual similarity is crucial.

In Recombee, we use image embeddings obtained by deep convolutional networks extensively. The advantage is that we can extract both high level (product type) and low level (style of textures) similarity of items from deep embeddings. You can find more about deep convolutional embeddings in this presentation.

Here is the t-SNE projection of products just based on their image embeddings.

You can see that it works great and CNN is capable of higher level understanding of objects.

At the same time, lower level features are taken into account.

Look at this cluster, what a beauty :)

And of course it works well in other domains such as recommending posters.

Lets zoom in a bit.

Or recommending beers.

Look how cans were clustered so similar styles are together.

We were pretty amazed by results and we work hard to further improve deep CNNs. In some recommendation scenarios recommendation models based solely on image embeddings outperform collaborative filtering methods. Powerful embeddings also enable completely new products — imagine that you take a picture of your shoes and you get recommended similar pieces from current catalog.

Predicting next purchases

Image embedding similarity based recommendation models are great for shopping experience — significantly boosting CTR. Higher CTR doesn’t always lead to increase in CR (higher purchases). One can combine it with the ReQL upsell or we can design more sophisticated recommendation algorithm.

With high quality image embeddings and interactions, one can use machine learning to predict next purchase.

You can use feedforward neural nets with history of purchases one-hot-encoded to input predicting probabilities of products to be purchased next.

Alternatively you can feed in image embeddings of few last products purchased. The limitation of feed forward net is that the history has to be small and fixed length.

Therefore using recurrent networks with long term memory capabilities might be more appropriate.

Again, you can feed in image embeddings instead of one-hot product vectors.

Results suggests that the best approach is feeding image embeddings to recurrent networks so this is one potent recommendation algorithm that can excel not only in shopping cart recommendations.

Optimization of recommender ensembles

One algorithm is fine, but ensembles are much more powerful. The strength of our technology in Recombee is that we can optimize algorithms for each scenario independently.

The optimization is however not easy, because real world is full of noise.

In optimization, you need to balance exploration and exploitation because exploring too much can lead to lower quality recommendation for some users, whereas limited exploration can lead to suboptimal recommendations for all users. We run sophisticated optimization algorithm based on evolutionary strategies and genetic programming. Fitness of algorithms is surrogate modelled, because their online performance is quite noisy even when you use big folds of users.

The aim is to climb up the performance landscape for every recommendation scenario.

And stay in well performing regions, while avoiding low performing ones.

The challenge is which performance criterion should we use. We believe that longer term criteria such as customer lifetime value are better than short term criteria such as CTR. However many of our customers have their business models aligned directly with CTR (e.g. advertisement revenues).

Reinforcement learning models and evaluation allows us to optimize long term criteria directly. We are obtaining many interesting research results in this area. And it is very challenging to optimize complex algorithms in a way that the cost for one recommendation is still affordable.

Real world examples

Above we have introduced many interesting machine learning algorithms. Our work is to make them fast, adaptive and scalable. We selected two use cases, to explain how you can apply them.

Recommending news and stories for one of the biggest publishing house in Czech Republic is challenging because of high traffic, changing content and editor capabilities to influent recommendations. We optimize immediate reward (CTR) increasing advertisement revenues and at the same time we look to longer term performance measures such as number of views in the session, total time spent, reader value, etc. Together with Geneea we improve machine capabilities to understand articles and their similarity.

Another example is the recommendation of movies and series for Showmax. Here, the challenge is to scale up algorithms to be able to understand local content and recommend for users in multiple countries. Here we collaborate with Showmaxlab a incorporate deep movie embeddings into recommendation to reduce the cold start problem and improve recommendations even more.

Recently, there was a meetup describing how to extract deep embeddings from movies. Fortunately, if you think about employing recommendation in your business, you do not need deep embeddings to achieve excellent results with our recommender.

If you are researcher or engineer, contact us if you like to work with us. Here is a video of our talk including discussion: