Learning to rank at Veepee
Introduction: Veepee’s use case
Veepee is an online flash sales website which offers a large number of new sales everyday with huge discounts up to 70%. Sales are available for a very short period of time. On Veepee’s website, there are about 200 flash sales on the home page on a given day divided into several sectors like fashion, accessories, toys, watches, home appliances, sports equipment, technology, wines, travel, etc.
New sales open every day and old sales either continue or stop. To help the customer find new sales, the home page is divided in 2 sections: “Opening Sales” and “Current Sales”. These sections are divided in 4 on the homepage in a A / B / A / B manner to mix novelty and current sales.
Because the number of sales is important, users might not scroll until the end of the homepage to see all the banners and might leave Veepee if no sales at the top of the page are relevant to them.
Thus, the main goal of the homepage customization will be to rank the banners so that the most relevant active sales for a customer appear on top of the page. For that, we rely on the user’s previous orders and preferences but also on sales popularity and other global information.
With its 2 millions daily active users, every modification on the home page is critical: it has to help the user find what he’s looking for while keeping the response time low.
The algorithm: Veepee’s approach
The metric that we want to optimize is the conversion rate: it is the sum over period of daily unique buyers divided by daily unique visitors. Thus we want to train an algorithm to predict how likely a sale will generate an order based
on the sale information itself and the user preferences. Using the probabilities of each sales displayed on the homepage, we would then order all the sales for each user. Our observations are the orders made by users and all the sales that were displayed on their homepage prior to this order.
We define a positive example as the sale that generated the order, and the negative examples as all the other sales.
Pointwise, Pairwise, Listwise
Our problem is a ranking problem, similar to Google’s paper Deep Neural Networks for YouTube Recommendations.
There are three ways to handle such problem as described below (source):
Pointwise approaches look at a single document at a time in the loss function. They essentially take a single document and train a regressor on it to predict how relevant it is for the current query. The final ranking is achieved by simply sorting the result list by these document scores. For pointwise approaches, the score for each document is independent of the other documents that are in the result list for the query. All the standard regression and classification algorithms can be directly used for pointwise learning to rank.
In our case, this approach would be predicting the probability of the sale to generate an order based on the features of the user and the sale.
Pairwise approaches look at a pair of documents at a time in the loss function. Given a pair of documents, they try and come up with the optimal ordering for that pair and compare it to the ground truth. The goal for the ranker is to minimize the number of inversions in ranking i.e. cases where the pair of results are in the wrong order relative to the ground truth. Pairwise approaches work better in practice than pointwise approaches because predicting relative order is closer to the nature of ranking than predicting class label or relevance score.
For one observation we have a positive example and about 200 negative examples. In this approach, we have to create pairs with one positive and one negative as a train observation, and we want the algorithm to score the positive example higher than the negative one.
Listwise approaches directly look at the entire list of documents and try to come up with the optimal ordering for it. There are 2 main sub-techniques for doing listwise Learning to Rank:
Direct optimisation of IR measures such as NDCG. E.g. SoftRank, AdaRank
Minimise a loss function that is defined based on understanding the unique properties of the kind of ranking you are trying to achieve. E.g. ListNet, ListMLE
Listwise approaches can get fairly complex compared to pointwise or pairwise approaches.
Currently, we are using the pairwise implementation as it gives better results than the pointwise and because the metrics used in the listwise approach are not suitable for our use case. As our ranking column is mainly a “1” for the positive example and then unsorted “zeros”, the listwise approach based on a ranking metric is not the most suitable.
We use a specific network called “siamese network” which takes two inputs: a positive and a negative sale. These inputs are sharing their layers and the loss is computed using both ouputs.
The loss used then is an adapted Hinge loss with alpha equals to 1.
As described earlier, the algorithm minimize the number of inversions which gives us better results in production than the pointwise approach.
The algorithm was successfully launched in production on Veepee’s homepage in January 2019 on 20% of the traffic.
It is now used on the entire population and it continues to increase the entry rate, the conversion rate and the turnover significantly over the iterations.
After multiple enhancements and features addition, we faced a bottleneck during the training: our implementation did not allow us to train as many models as we wanted at the same time.
It slowed our release process and reduced the number of enhancements we could AB test.
In the next article, we will describe our past implementation and how we are upgrading it to a full Tensorflow pipeline saving us a lot of time and resources.
This article is written by Till, Data Scientist of the vpTech community.