Five years of Criteo AI Lab

Integrating real AI into AdTech

Criteo R&D
Criteo Tech Blog
8 min readDec 21, 2023

--

Authors: Romain Lerallut & Liva Ralaivola

Photo by fabio on Unsplash

Five years ago, we set out to do something unheard of in the French tech market — starting a state-of-the-art AI Lab at Criteo.

AI serves a critical role in advertising. It’s the difference between a tailored ad that provides real value and an irrelevant ad that turns people off from a brand forever. AI is the catalyst that converts raw data into tangible results that make people feel like an ad was created just for them and their needs. AI enables marketers to find the right people to engage instead of blindly investing in mass audiences based purely on reach. For marketers and media owners, this means better performance but also means the difference between growth and decline.

To those who are not inside Criteo, the concept of an “AI Lab” may suggest a group of tech nerds philosophizing about tech or building robots, a la ‘Ex Machina’, but while we do love geeking out on tech, the Criteo AI Lab (a.k.a. CAIL) is so much more than that. Our AI Lab takes our boldest aspirations and turns them into tangible reality.

AI is the foundation of everything we have built at Criteo — including how Criteo got its name. For those of you who may not know the story, our founders called our company ‘Criteo’ because it’s derived from the Greek word for “I decide”, which makes sense given that our engine makes 100 million prediction decisions per second.

Criteo & AI: A Long Story

Criteo was making AI-driven predictions from day one, initially as a movie recommendation engine, but then our founders saw the opportunity to leverage our technology to feed the larger needs of the advertising industry through retargeting. We then expanded our capabilities to service full-funnel performance marketing and retail media broadening to the larger opportunity of commerce media.

Our organization and infrastructure are set up to serve the best technology to our clients through a philosophy of continuous improvement through AI, so we can continue to advance our technological innovation to power the future of commerce.

To be truly successful in AI, you need three things:

  • Data
    Good data, and the more the better! Closed loop data, where you can make a prediction and see the result is the best data of all. In retailer-based advertising, thanks to our deep connection with our partners, Criteo has access to some of the best commerce data.
  • AI models
    Great AI models that are designed, built and improved by top class AI researchers and engineers
  • Scalability
    A platform that can execute with great precision, extremely low latency (in the order of milliseconds) and high throughput (greater than 100M AI predictions per second). Building and maintaining a system that can do reliably this trillions of times a day, 24/7,

Criteo is a performance advertising company at its core, so we’ve found ways to leverage AI for better campaign performance. From using deep neural networks to help our clients uncover their best audiences to activating massive amounts of first-party data to maintain addressability for our clients.

It takes a village

This diagram taken from “Hidden Technical Debt in Machine Learning Systems, Sculley et al. (Google), NeurIPS 2015” illustrates the rationale at the heart of the CAIL setup.

The black box at the center is the AI that the press talks about, the science, the models, and all the cool stuff live in that box.

The rest is the landscape required to actually do something with that AI. The data flow, quality control, live monitoring, configuration, and model management, etc. All these elements are needed to go from a great idea to an excellent product.

Some of these boxes are provided by our brethren and sisters in our Infrastructure and Platform teams, for which we are so thankful (may the ground ever be soft under their feet). Many others fall in the scope of engineering teams developing the components of our products.

The unique idea for CAIL was, instead of having a lab separate from the rest of R&D, to have it joined with engineering teams under the same roof. To effectively support the research and conversely enable a fast deployment of cutting-edge technologies into our products. Criteo truly has Research and Development.

As a result, CAIL owns most components of the Criteo platform that have an AI-based core. This proximity of AI researchers and engineers (with many dual-skilled profiles) allows us to seamlessly transition from research to prototyping to production. Thus, ensuring a smooth pipeline from conception to production, while taking into account the production constraint in our exploratory work.

Alone you go faster, together you go further

Just as CAIL is fully integrated into Criteo’s R&D, it is also actively involved in energizing the local AI ecosystem. We jointly supervise PhD students with academia, and since 2021, we have enjoyed a renewable 5-year partnership with INRIA (the French Research Institute for Applied Mathematics and Computer Science). Within this framework, in 2022, we created a joint team, FAIRPLAY, that tackles the critical topic for the Open Internet of “Coopetitive AI: Fairness, Privacy, Incentivization”, that investigates how to develop new AI models that would contribute to making fair, transparent and secure marketplaces where agents might be buyers or sellers.

CAIL makes serious contributions to the foundations of ML research. Each year, we publish numerous papers in the most prestigious machine learning venues including: NeurIPS, ICML, AISTATS and ICLR. We consistently target venues that ensure articles, code, and datasets are available to the scientific community at large.

The progress of AI science cannot be sustainable if the knowledge is kept to some happy few. We in the Criteo AI Lab are committed to devoting time to educating on the topic: we are involved in teaching AI classes not only to our own engineers and leadership team but also to masters students in many French universities (Sorbonne University, ENSAE, Paris Science et Lettres University, University of Saint-Etienne, and more).

In addition, we organize hackathons with scientific and business schools (ESSEC, ENSIIE, Doctoral School of Paris Saclay). Those interactions make it possible to not only share our expertise in AI but also allow us to do so, factoring in the constraints brought by the issue of deploying a massive AI engine in the real world.

Open Software, Open Data, Open Internet

Criteo believes in the Open Internet, a place of cooperation and competition not ruled by a single entity. It is no surprise that R&D also believes strongly in open source.

We use, support, and contribute to that ecosystem: see https://criteo.github.io/ and, more specifically, https://github.com/criteo-research for forks as well as original contributions. All the papers published by CAIL at the prominent ML venues can be accessed and downloaded from HAL, and science tales (learning from conferences, non-technical digests of scientific papers and their uses) are shared happily with the ML and Tech communities in AI-dedicated posts on Medium.

CAIL also has an additional opportunity to contribute to the research community: we have a lot of data, and we are often willing and able to share it. Since 2014, when we released our first open datasets, we have regularly contributed problems and datasets to the global research community, and we do not distinguish between academic and industrial researchers. Our datasets are available to anyone in the community ( https://ailab.criteo.com/ressources/ ) under a Creative Commons licence (CC-BY-SA-NC).

What’s next?

Given the pace at which AI has been evolving for the last few years, both from the purely scientific perspective and from a technological one, almost anything seems possible in the future of AI in commerce realm. Imagination will be the only limit.

Our mission is to imagine the impossible, put North Stars in the sky, and shoot for them. We may think of a few directions that will be key differentiators.

First and foremost, the target is to build a first commerce “foundation model”. Foundation models are Deep Learning models trained over vast amounts of data, which can “extract” hidden and essential knowledge — the engine under the hood of ChatGPT is a foundation model for language, trained on all scrapeable data on the web. Criteo, with both daily access to terabytes of data and unmatched expertise in Machine Learning, is well-placed to embark on such an ambitious endeavour; our DeepKNN embeddings, critical to the quality of our AI engine, is a first step towards a commerce foundation model and evidence that we are already on our way. These models will enable us to have more refined (and possibly intelligible) bidding, recommendation, audience understanding, pricing, and more…

The industry may also be geared towards greater adoption of Generative AI, pending the development of appropriate ways to overcome known limitations (e.g. hallucinations, inaccuracies). It might go towards the deployment of Media Planning assistants, which would create elaborated budget allocations for marketers with a prompt as simple as “I would like to spend 10,000€ on this brand, for one month, what is your recommendation?” (the commerce foundation model could play a role in answering this). Of course, the most appealing expectation for the commerce industry and the AdTech ecosystem is to witness the deployment of the first on-the-fly hyper-personalized creatives, adapted to contextual information, first-party data, and any other type of signals that would help make the messaging as targeted as possible.

Photo by Steve Johnson on Unsplash

It seems clear that regardless of the industry, Artificial Intelligence is bound to disrupt it. Commerce (not only online !) will be no exception. As more powerful AI technology becomes commoditized, our role will remain at the junction of science, technology, and business. To bring to our clients, and ultimately to everyone, the best technology offers.

--

--

Criteo R&D
Criteo Tech Blog

The R&D team building the Commerce Media Platform for the Open Internet.