Notes from AISTATS 2024

The 27th International Conference on Artificial Intelligence and Statistics (AISTATS) in 2024 just finished in Valencia (Spain), and Criteo AI Lab was there!

Hugo Richard
Criteo Tech Blog
7 min readJun 20, 2024

--

This year, one of the leading machine learning conferences around the world received 1980 submissions, of which 546 were accepted. I attended the conference and had the honor of presenting two research papers.

Criteo AI Lab participation

Constant or logarithmic regret in asynchronous multiplayer bandit (By Hugo Richard, Etienne Boursier, Vianney Perchet)
https://proceedings.mlr.press/v238/richard24a/richard24a.pdf

This work is done with two Criteos (myself and Vianney Perchet) and Etienne Boursier (from University Paris Saclay, LMO, Orsay). It presents the problem of asynchronous multiplayer bandits, where multiple players play the same bandit and have a probability of being active at each round. If two active players pull the same arm none of them get any reward. We show in this work that with a logarithmic number of communications, constant regret is achievable. This study is relevant to Criteo in the context of multiplayer auctions, where a DSP decides which add campaigns to allocate to which auction.

Multi-armed bandits with guaranteeing revenue per arm (By Dorian Baudy, Mathieu Molina, Hugo Richard, and Vianney Perchet)

This work is done with two Criteos (myself and Vianney Perchet) as well as two other members of the Fairplay team (Dorian Baudry (left picture) and Mathieu Molina (right picture)). It addresses how to guarantee a certain amount of revenue to each client of a recommender system. This could allow Criteo to guarantee a certain amount of revenue to each of its clients, but it could also be used by the music or entertainment industry to ensure the fairness and viability of its platforms. In contrast to most works on safe RL, there is no safe known allocation at the beginning that would ensure that the constraints are full-filled. We also fully characterize the constraint violation and the regret paid by several algorithms.

Highlights of the conference

Oral presentations

Online learning of decision trees with Thomson Sampling (by Ayman Chaouki)

Decision trees are widely used models especially because they are interpretable but current methods are heuristic without any optimality guarantees. This innovative paper changes that and gives a way to find optimal trees by casting tree building as an RL problem. As having interpretable models is important to understand the decisions made by automated systems. I see this advancement as particularly interesting for Criteo.

Best-of-Both-Worlds Algorithms for Linear Contextual Bandits (by Yuko Kuroki)

This work considers a K-arm contextual bandit with d-dimensional stochastic contexts and with a linear loss that can be either stochastic or adversarial. They propose an algorithm that matches known regret bounds in the stochastic setting up to a factor K d and in the adversarial setting up to a factor root of d.

Best of both worlds algorithms like the one present in this paper are particularly appealing to Criteo as they achieve good performance on easy problems but also have performance guarantee if nothing is known about the data distribution. They could be of use in the design of robust recommendation systems.

Exploration via linearly perturbed loss minimisation (by David Janz)

This work solves structured bandit by playing at each round the minimizer of a linearly perturbed regularized negative log-likelihood function. It reduces to perturbed history exploration for generalized linear rewards but produces consistent estimators for more general rewards. It is also very easy to implement in practice which makes it promising to go beyond generalized linear models usually considered at Criteo.

Membership Testing in Markov Equivalence Classes via Independence Queries (By Jiaqi Zhang)

Given a class of graph and observational data generated by some unknown graph, the problem studied by this work is to determine if the graph belongs to the class. They show that the number of conditional independence tests needed are exponential in the size of the maximum undirected clique of the given class (with upper and lower bounds). Testing whether our current understanding of the causality graph assumed in our production system at Criteo is correct can now be done more efficiently.

On the Misspecification of Linear Assumptions in Synthetic Control (By Achille Nazaret)

Given observational data, to estimate treatment effects, we compare the treated unit with a linear combination of untreated units. What is the impact of misspecification ? Authors have practical results using data on tobacco consumption. At Criteo the treatment is whether an ad was shown or not to a user which makes this work on estimating treatment effects relevant to us.

An Impossibility Theorem for Node Embedding (By T. Mitchell Roddenberry)

This work studies node embedding methods and show that it is in general impossible to design a node embedding method that is self-contained, consistent and graph-aware which are 3 natural properties we would expect node embedding methods to have. Graph embeddings could be used at Criteo to represent the interaction we have with users so this impossibility result is nice to know.

Approximate Leave-one-out Cross Validation for Regression with ℓ1 Regularizers (By Arnab Auddy)

This work studies approximate leave-one-out methods for a class of problems with l1 regularizers (such as lasso). They show that when the dimension and samples grow large (but the ratio remains bounded), their approximate leave-one-out estimate converges to the true leave-one-out estimator. In practice, this work is of utmost interest in that it makes sparse regression (a useful tool for Criteo) much faster to tune.

Posters

Autoregressive Bandits (By Fransesco Bacchiocchi)

This work studies a new setting where the reward is governed by an autoregressive process of order k. They obtain upper bounds with an algorithm that resembles LinUCB (but the setting is quite different since past rewards influence future rewards). In Criteo production systems, you expect the past rewards to have an influence on the current reward. This work might help us estimate a more accurate model of our production systems.

Multitask Online Learning: Listen to the Neighborhood Buzz (By Juliette Achddou)

This work considers a multi-task online learning problem where agents can communicate with their neighbors in some given graph. The presented algorithm never does worse than doing tasks separately but does better
if the tasks are similar. In a production system such as the one used at Criteo, it is often the case that many online tasks are performed in parallel and some amount of communication is possible. The approach taken in this work could be used to obtain better results when two correlated tasks should be performed.

In the end, it was a great conference, with lots of bold ideas and interesting people. We are looking forward to AISTATS 2025.

Au revoir!

--

--