Notes from AISTATS 2024

The 27th International Conference on Artificial Intelligence and Statistics (AISTATS) in 2024 just finished in Valencia (Spain), and Criteo AI Lab was there!

Published in

Criteo Tech Blog

7 min readJun 20, 2024

This year, one of the leading machine learning conferences around the world received 1980 submissions, of which 546 were accepted. I attended the conference and had the honor of presenting two research papers.

Criteo AI Lab participation

Constant or logarithmic regret in asynchronous multiplayer bandit (By Hugo Richard, Etienne Boursier, Vianney Perchet)
https://proceedings.mlr.press/v238/richard24a/richard24a.pdf

This work is done with two Criteos (myself and Vianney Perchet) and Etienne Boursier (from University Paris Saclay, LMO, Orsay). It presents the problem of asynchronous multiplayer bandits, where multiple players play the same bandit and have a probability of being active at each round. If two active players pull the same arm none of them get any reward. We show in this work that with a logarithmic number of communications, constant regret is achievable. This study is relevant to Criteo in the context of multiplayer auctions, where a DSP decides which add campaigns to allocate to which auction.

Multi-armed bandits with guaranteeing revenue per arm (By Dorian Baudy, Mathieu Molina, Hugo Richard, and Vianney Perchet)

Multi-Armed Bandits with Guaranteed Revenue per Arm

We consider a Multi-Armed Bandit problem with covering constraints, where the primary goal is to ensure that each arm…

sciencespo.hal.science

This work is done with two Criteos (myself and Vianney Perchet) as well as two other members of the Fairplay team (Dorian Baudry (left picture) and Mathieu Molina (right picture)). It addresses how to guarantee a certain amount of revenue to each client of a recommender system. This could allow Criteo to guarantee a certain amount of revenue to each of its clients, but it could also be used by the music or entertainment industry to ensure the fairness and viability of its platforms. In contrast to most works on safe RL, there is no safe known allocation at the beginning that would ensure that the constraints are full-filled. We also fully characterize the constraint violation and the regret paid by several algorithms.

Highlights of the conference

Oral presentations

Online learning of decision trees with Thomson Sampling (by Ayman Chaouki)

Online Learning of Decision Trees with Thompson Sampling

Decision Trees are prominent prediction models for interpretable Machine Learning. They have been thoroughly…

arxiv.org

Decision trees are widely used models especially because they are interpretable but current methods are heuristic without any optimality guarantees. This innovative paper changes that and gives a way to find optimal trees by casting tree building as an RL problem. As having interpretable models is important to understand the decisions made by automated systems. I see this advancement as particularly interesting for Criteo.

Best-of-Both-Worlds Algorithms for Linear Contextual Bandits (by Yuko Kuroki)

Best-of-Both-Worlds Algorithms for Linear Contextual Bandits

We study best-of-both-worlds algorithms for $K$-armed linear contextual bandits. Our algorithms deliver near-optimal…

arxiv.org

This work considers a K-arm contextual bandit with d-dimensional stochastic contexts and with a linear loss that can be either stochastic or adversarial. They propose an algorithm that matches known regret bounds in the stochastic setting up to a factor K d and in the adversarial setting up to a factor root of d.

Best of both worlds algorithms like the one present in this paper are particularly appealing to Criteo as they achieve good performance on easy problems but also have performance guarantee if nothing is known about the data distribution. They could be of use in the design of robust recommendation systems.

Exploration via linearly perturbed loss minimisation (by David Janz)

Exploration via linearly perturbed loss minimisation

We introduce exploration via linear loss perturbations (EVILL), a randomised exploration method for structured…

arxiv.org

This work solves structured bandit by playing at each round the minimizer of a linearly perturbed regularized negative log-likelihood function. It reduces to perturbed history exploration for generalized linear rewards but produces consistent estimators for more general rewards. It is also very easy to implement in practice which makes it promising to go beyond generalized linear models usually considered at Criteo.

Membership Testing in Markov Equivalence Classes via Independence Queries (By Jiaqi Zhang)

Membership Testing in Markov Equivalence Classes via Independence Query Oracles

Understanding causal relationships between variables is a fundamental problem with broad impact in numerous scientific…

arxiv.org

Given a class of graph and observational data generated by some unknown graph, the problem studied by this work is to determine if the graph belongs to the class. They show that the number of conditional independence tests needed are exponential in the size of the maximum undirected clique of the given class (with upper and lower bounds). Testing whether our current understanding of the causality graph assumed in our production system at Criteo is correct can now be done more efficiently.

On the Misspecification of Linear Assumptions in Synthetic Control (By Achille Nazaret)

On the Misspecification of Linear Assumptions in Synthetic Control

The synthetic control (SC) method is a popular approach for estimating treatment effects from observational panel data…

arxiv.org

Given observational data, to estimate treatment effects, we compare the treated unit with a linear combination of untreated units. What is the impact of misspecification ? Authors have practical results using data on tobacco consumption. At Criteo the treatment is whether an ad was shown or not to a user which makes this work on estimating treatment effects relevant to us.

An Impossibility Theorem for Node Embedding (By T. Mitchell Roddenberry)

An Impossibility Theorem for Node Embedding

With the increasing popularity of graph-based methods for dimensionality reduction and representation learning, node…

arxiv.org

This work studies node embedding methods and show that it is in general impossible to design a node embedding method that is self-contained, consistent and graph-aware which are 3 natural properties we would expect node embedding methods to have. Graph embeddings could be used at Criteo to represent the interaction we have with users so this impossibility result is nice to know.

Approximate Leave-one-out Cross Validation for Regression with ℓ1 Regularizers (By Arnab Auddy)

Approximate Leave-one-out Cross Validation for Regression with $\ell_1$ Regularizers (extended…

The out-of-sample error (OO) is the main quantity of interest in risk estimation and model selection. Leave-one-out…

arxiv.org

This work studies approximate leave-one-out methods for a class of problems with l1 regularizers (such as lasso). They show that when the dimension and samples grow large (but the ratio remains bounded), their approximate leave-one-out estimate converges to the true leave-one-out estimator. In practice, this work is of utmost interest in that it makes sparse regression (a useful tool for Criteo) much faster to tune.

Posters

Autoregressive Bandits (By Fransesco Bacchiocchi)

Autoregressive Bandits

Autoregressive processes naturally arise in a large variety of real-world scenarios, including stock markets, sales…

arxiv.org

This work studies a new setting where the reward is governed by an autoregressive process of order k. They obtain upper bounds with an algorithm that resembles LinUCB (but the setting is quite different since past rewards influence future rewards). In Criteo production systems, you expect the past rewards to have an influence on the current reward. This work might help us estimate a more accurate model of our production systems.

Multitask Online Learning: Listen to the Neighborhood Buzz (By Juliette Achddou)

Multitask Online Learning: Listen to the Neighborhood Buzz

We study multitask online learning in a setting where agents can only exchange information with their neighbors on an…

arxiv.org

This work considers a multi-task online learning problem where agents can communicate with their neighbors in some given graph. The presented algorithm never does worse than doing tasks separately but does better
if the tasks are similar. In a production system such as the one used at Criteo, it is often the case that many online tasks are performed in parallel and some amount of communication is possible. The approach taken in this work could be used to obtain better results when two correlated tasks should be performed.

In the end, it was a great conference, with lots of bold ideas and interesting people. We are looking forward to AISTATS 2025.

Au revoir!

Notes from AISTATS 2024

The 27th International Conference on Artificial Intelligence and Statistics (AISTATS) in 2024 just finished in Valencia (Spain), and Criteo AI Lab was there!

Criteo AI Lab participation

Multi-Armed Bandits with Guaranteed Revenue per Arm

We consider a Multi-Armed Bandit problem with covering constraints, where the primary goal is to ensure that each arm…

Highlights of the conference

Oral presentations

Online Learning of Decision Trees with Thompson Sampling

Decision Trees are prominent prediction models for interpretable Machine Learning. They have been thoroughly…

Best-of-Both-Worlds Algorithms for Linear Contextual Bandits

We study best-of-both-worlds algorithms for $K$-armed linear contextual bandits. Our algorithms deliver near-optimal…

Exploration via linearly perturbed loss minimisation

We introduce exploration via linear loss perturbations (EVILL), a randomised exploration method for structured…

Membership Testing in Markov Equivalence Classes via Independence Query Oracles

Understanding causal relationships between variables is a fundamental problem with broad impact in numerous scientific…

On the Misspecification of Linear Assumptions in Synthetic Control

The synthetic control (SC) method is a popular approach for estimating treatment effects from observational panel data…

An Impossibility Theorem for Node Embedding

With the increasing popularity of graph-based methods for dimensionality reduction and representation learning, node…

Approximate Leave-one-out Cross Validation for Regression with $\ell_1$ Regularizers (extended…

The out-of-sample error (OO) is the main quantity of interest in risk estimation and model selection. Leave-one-out…

Posters

Autoregressive Bandits

Autoregressive processes naturally arise in a large variety of real-world scenarios, including stock markets, sales…

Multitask Online Learning: Listen to the Neighborhood Buzz

We study multitask online learning in a setting where agents can only exchange information with their neighbors on an…

Written by Hugo Richard