Tuning related-items recommenders with Bayesian Optimization
The Personalisation team is responsible for providing recommendations as a product to Adevinta’s marketplaces and improving the user experience through data and Machine Learning. We’re currently working on a hybrid related-items recommendation approach that will soon be released in several marketplaces, including Milanuncios and Segundamano.mx. In order to maximise the value of the model, we needed to perform hyperparameter tuning, which can be a very tedious process if performed manually or exhaustively. For that reason, we‘ve selected Katib, a framework for hyperparameter tuning that’s part of Kubeflow, to automate and optimise this step using Bayesian Optimization tuning strategies.
Keep reading to find out how we were able to tune our recommendation algorithms more efficiently and effectively and reduce the number of evaluation trials needed to tune our algorithms from millions to less than one hundred.
Using Katib, Kubeflow’s framework for hyperparameter tuning, we’ve reduced the number of executions needed to tune our algorithms from millions to less than one hundred.
The importance of hyperparameter tuning in Machine Learning
To maximise the impact of Machine Learning (ML), algorithms need to be properly tuned, which in practice means choosing the best values for the algorithm parameters so that the desired performance metric is maximised. Without proper tuning, the most advanced deep learning network could only be as effective as an algorithm making random guesses.
There are different tuning strategies for finding the best model hyperparameters. The simplest approach is Grid Search, which exhaustively tries a predefined range of values for each hyperparameter. Tuning N hyperparameters requires trying M values for each of them, resulting in a total of MN trials!! A more efficient option is Random Search, where only a subset of hyperparameter combinations are tried at random. However, this approach does not exploit the evaluation results of previous trials, which implies that it can waste time evaluating unpromising areas of the search space. In this project, we experimented with Bayesian Optimization (BO), a probabilistic informed-search approach, which is able to choose the next most promising hyperparameters based on the results of previous trials and hence to find the optimal in less trials than Grid and Random Search.
The benefit of Bayesian Optimization for tuning our recommender models
To generate our related-items recommendations, we currently combine two different algorithms: a graph-based (collaborative filtering) approach called AdamicAdar, and a content-based approach based on the BM25F formula. While the collaborative filtering model finds similar items based on user interactions (i.e. users who viewed this car also viewed these other cars), in the content-based approach similarity is based on the content features of the ad (title, description, car-specific features, etc.). Generally, collaborative filtering approaches tend to perform better as they embrace the “wisdom of the crowd” principle. However, content-based approaches can improve the quality of recommendations by finding items with similar characteristics (e.g. brand, price). Furthermore, content-based approaches allow us to recommend items which don’t have any interaction yet.
In both algorithms, there are several hyperparameters that need to be tuned. In the AdamicAdar algorithm, there are a total of eleven main hyperparameters, and in our BM25F content-based model, a total of eight when tuned for the generalist category and seventeen when tuned for the car-specific vertical. Given that most of the hyperparameters are floating-point and integer values, this means we need to deal with very high-dimensional search spaces. Bayesian Optimization is therefore a clear winner if we want to efficiently find the optimal hyperparameters in such large spaces.
Choosing between different Bayesian Optimisation variants
In contrast to Random or Grid Search, BO approaches keep track of past evaluation results, which are used to build a probabilistic model that maps hyperparameters to a probability of a score on the objective function. BO works in a sequential manner, but let’s find out how exactly. Firstly, the model begins evaluating some initial hyperparameter configurations. After some evaluation runs (or trials), and based on the resulting objective metric scores, the model determines the most promising hyperparameters to evaluate next. BO therefore allows us to narrow the search of the parameter space to the most promising areas, saving us from wasting time with those that are clearly underperforming.
There are several variants of BO methods depending on how the probabilistic model is built. Katib, Kubeflow’s framework for hyperparameter tuning, currently supports two variants: Gaussian Processes (GP), as implemented in the scikit-optimize library, and Tree Parzen Estimators (TPE), as implemented in hyperopt.
TPE builds a model by applying the Bayes rule, where instead of directly representing the probability of an objective score given the hyperparameters, it models two different distributions for the hyperparameters given the objective score: one where the value of the objective is less than the threshold l(x), and one where the value of the score is greater than the threshold g(x). These two distributions are then used to maximize the Expected Improvement (EI) criteria, which is proportional to the ratio between l(x) / g(x).
Differently, GP employs Gaussian Process regression as the probabilistic model, which is a well-established Bayesian statistical approach for modeling functions. To get predictions at unseen hyperparameters, EI is then commonly used by weighting all possible predictions based on the computed posterior distribution.
Based on our preliminary experiments so far, TPE seems to be slightly more robust and converges faster than this particular implementation of GP, particularly because it is less prone to repeating already evaluated hyperparameters.
Katib can be integrated in a Kubernetes cluster and facilitates the orchestration of the tuning experiments. In particular, it helps to coordinate the different trials and decide what hyperparameters to try next.
As illustrated in the figure below, the Experiment Controller is responsible for identifying the next config to try from the Suggestions module via the Katib manager. This config is then used by the Trial controller when submitting the new Trial job. The metric results are finally collected by the Metrics Collector process so that they can later be used for choosing the next best config.
Multi-objective optimisation in Katib
When tuning our recommendation algorithms, we always aim to find a good trade-off between the number of recommendations we can serve and their quality. An efficient way of finding this optimal trade-off is by doing multi-objective optimisation (MOO).
MOO is the process of simultaneously optimising multiple conflicting criteria, such as accuracy and coverage. Because Katib does not support MOO by design, we extended our offline evaluation pipeline to support this functionality.
Our strategy for MOO is to use BO to find the Pareto front. This is defined as the set of hyperparameter configs that are Pareto optimal, that is, configs that cannot be improved in any of the objective metrics without degrading the others. To find the Pareto front, we define a N-dimensional multi-objective metric (MOM) that Katib has to minimise. Specifically, the MOM is defined as the square of the euclidean distance between the optimal point, where each metric takes its maximum value, and the actual metric values of the config being evaluated. More formally, assuming that the max value of our metrics is 1, the formula to calculate the MOM is defined as follows:
The figure below shows a simplistic example of this strategy. In this example, Katib aims to optimise two metrics: coverage and relevance both to their maximum value. Katib starts at Trial 1 and moves up to Trial 4 and it will start populating the Pareto front (blue line). Note that several models will lie on the Pareto front, and we will need to manually pick one.
Promising results so far
Katib is more effective and efficient for tuning our recommendation algorithms compared to our previous approach, i.e., grid search. In particular, with Bayesian Optimisation we have successfully found Pareto optimal hyperparameters of our algorithms in less than 100 trials out of a million possible combinations! As such, Katib with BO is now the standard strategy we use to tune recommender algorithms in our team.
Finally, we’d like to give special thanks to the CRE team at Adevinta, who provided us with the tools and support needed to complete the Katib integration with our offline evaluation pipeline.