Building Criteo API, One Step at a Time

Scott McCord
Criteo R&D Blog
Published in
10 min readOct 27, 2022

From frozen SOAP monolith, to an ever-evolving REST ecosystem.

Photo by Omar Flores on Unsplash

Criteo is an open platform for advertising and commerce media. For the past several years, developers at Criteo have been hard at work building an API infrastructure that lets external companies benefit from this platform in a programmatic and self-service way.

A substantial portion of our clients now prefer API integrations, rather than going through our web applications. On average, we have 1,500 clients calling the API every day. Given the growing popularity of API usage, I thought it would interesting to understand how the API system was originally built, and to see how it has evolved to support present day traffic.

I’m a software engineer at Criteo working on the API Developer Experience team, so for the past few years, I’ve had firsthand knowledge of how Criteo API works. I even worked with a small team to build several parts of the core API infrastructure. But when I began, I didn’t have a strong understanding of several design choices that were made, or comprehend the history of legacy applications that the new infrastructure replaced.

I thought it would be interesting to dive into Criteo’s API history, by sifting through archived Confluence pages, speaking with Criteo older timers, and parsing long since deprecated code. I’m excited to share my findings with you, to show how Criteo’s API offerings have evolved over the past several years. This blog post will focus on the past, while two subsequent posts will focus on the present and the future. I hope you enjoy it, and thanks for following along with me.

Humble Beginnings

Criteo was founded in 2005 and the underlying API infrastructure was a simple SOAP application. During the first five years, a few endpoints may have been exposed publicly, but there wasn’t any concerted effort to offer an API as a service. Advertisers could interact with Criteo through its Self Service front-end platform, which would then call Criteo’s internal back-end services.

SOAP, or Simple Object Access Protocol, is a messaging protocol specification for information transfer over a network. You may be more familiar with the REST protocol, as it has overtaken SOAP in popularity over the last ten years. SOAP was developed by Microsoft in 1998, and uses XML exclusively for its message format. Nowadays, if I mention SOAP in the office, several engineers turn their heads and glare.

The Criteo Performance and Optimization Platform

It wasn’t until October 2010 when the earliest iteration of an external API began to emerge, as part of the new Criteo Performance and Optimization Platform (CPOP). Although this was a front-end client facing platform, several endpoints of the SOAP backend application were exposed to externals. CPOP’s API gave advertisers the ability to perform basic functions like retrieve campaign categories, change CPCs (cost per clicks), and view statistics on how their campaigns were faring.

These endpoints weren’t specifically built for externals, so features were limited and restricted to the needs of the front-end CPOP application. Authentication for the first API was token based, where clients would create a login and password pair and retrieve a session token. When calling the API, you would provide the client and authorization tokens, and the action you wanted to perform.

In 2013, the second version of the CPOP API was released in order to expose several new statistics endpoints. The biggest issue with the SOAP API was that it was extremely static. SOAP APIs don’t support versioning or changes, so if we wanted to introduce a new feature, or even as something as simple as a new Enum value in a list (for example, if we were to add a new category to classify a campaign), we were forced to introduce a new endpoint altogether. Even for non-breaking changes (changes that don’t break our clients’ integrations), a new endpoint was still necessary. This added extra cost to maintaining the backend, as code would have to handle both versions of the API, and the later versions would not be able to rely on earlier versions for their implementations.

Birth of the Marketing API

Given these issues, developing external APIs within Criteo was hard. For several years, no substantial development was done on the external API. However, as clients and the product team began placing more emphasis on offering an updated external API service, a better solution was needed. The team at the time decided that switching to a REST API, while costly upfront, would help the Criteo API ecosystem grow and substantially decrease maintenance costs. This application would be known as the Marketing API (MAPI), and would be built to support the new front-end application, the Criteo Performance Platform (CPP), that advertisers used to manage their campaigns.

Eventually, MAPI would replace all of the legacy endpoints that were available in the CPOP API. Teams contributing could implement new endpoints and update contracts, without having to expand the ever increasing number of supported endpoints.

At the same time, Criteo decided to do a major shift in the statistics database underpinning the reporting service. I won’t go into too much detail here (the migration alone would be a topic for a whole other blog post), but the idea was to transition to a more reactive and resilient storage systems for statistics that would be called by MAPI’s reporting endpoints.

As the migration plans developed, one sizeable hitch became apparent: there needed to be a way to maintain backwards compatibility for our CPOP API users as we supported them through the migration.

The Platform API — A step backwards to move forwards

This hitch was mainly a technical one. The older statistics database and the new system (built using Kafka and Druid, Apache’s real-time statistical database), produced different results when called by the same endpoints. The team couldn’t simply have the CPOP API call the new database; everything would break.

Photo by CHUTTERSNAP on Unsplash

In order to ensure backwards compatibility, a new version of that API would have to be introduced to replicate the functionality of legacy API with the new database system. And so in early 2017, while MAPI was being developed, the team introduced another significant API system, known as PAPI.

PAPI, or the Platform API, was never meant to be permanent. Instead, it was built solely to be used by clients still using the SOAP API until they could be transitioned to MAPI. In total, it took clients a few years to complete the transition from PAPI’s SOAP API to MAPI’s REST API. During this time, MAPI was able to evolve and introduce new features, but PAPI, as designed, remained static.

MAPI: Successes and Failures

Switching to the REST protocol had the intended effect: internal teams within Criteo began exposing more of their services externally. MAPI became a proxy application where external users could call MAPI’s endpoints, and the application would redirect users’ calls to the appropriate internal service. MAPI’s versioning strategy meant that Criteo could more easily introduce new endpoints, compared to CPOP and PAPI. In short, the immediate switch was a success.

But over time, it was clear that the design wasn’t ideal. There was a single team in charge of MAPI, the predecessor to my current team, called Webapps Engineering. As more teams wanted to contribute to MAPI, management of the repository became challenging. Teams were spread out across the globe, so code reviews often took days. Teams also wanted to control their own release cycles, so at least one team forked the application (while sharing the core MAPI library). This worked to an extent, but deviated from the universal API idea on which MAPI was founded.

MAPI was built in .NET, which is one of the standard programming languages used by Criteo. However, there’s a sizeable population of API developers that use Scala. And even though MAPI was meant to be primarily a Gateway application, it still required Scala developers to write some C# code to ensure their endpoints were exposed.

The API was also designed with the impression that the client would be the primary user. This was reflected in the way that authentication and authorization was managed. MAPI required user tokens to access protected endpoints and clients would generate with their Criteo username and password. This meant that if clients wanted to share access to their data with third parties (like agencies), they would have to share their credentials.

And finally, while introducing changes to the API was easier for developers, the versioning strategy wasn’t coherent. Decommissioning endpoints was a challenge, and there was no contract between Criteo and API users for the duration that an endpoint or version should last.

It became clear that, if Criteo wanted to have a first class API ecosystem, MAPI was not the answer.

Asynchronous -> Synchronous -> Asynchronous

A funny anecdote to emerge over my research highlights that even in software development, what goes around can always come back around. In the original CPOP application, the statistics API relied of a huge SQL database that was quickly overloaded. Given this problem, the team decided to make the request asynchronous. The client would send the initial request, but would not immediately get the response. Instead, they would poll the API, and, when they received a response stating that their request was ready, they would make a new request to get the results.

With the migration to MAPI, the Druid storage system proved to be so powerful that requests that took minutes previously only took a few milliseconds. This improvement let the team transition the statistics endpoints from the asynchronous pattern described above, to a synchronous request (i.e. the data is returned on the first API call made by the user).

But there was one more migration to be had. Eventually, the team introduced a new piece of technology for statistics storage: Vertica. It was far more flexible, but less performant. Because of this, the team had the same overloading issues it faced with the first storage solution. In the end, the statistics API is being migrated back to an asynchronous pattern and the synchronous endpoints will eventually be removed.

Birth of Criteo API

As MAPI adoption rates increased, there was an interesting shift in Criteo’s business strategy. In early 2019, the founder and CEO at the time, Jean-Baptiste Rudelle, committed to significantly expanding the number of clients working with Criteo across the ad-tech space. The idea was to fight against the walled garden approach of advertising, to encourage a more open ecosystem. At the foundation of this strategy was the theory that by introducing a first class API, different stakeholders across the industry would be more likely to adopt Criteo Services if they weren’t forced to go through Criteo’s web applications.

And although MAPI was a vast improvement from the old SOAP APIs, it would not be able to deliver on the scope and the functionality of the new vision for Criteo API. And so, in the summer of 2020, teams began developing what is now known as Criteo API.

Takeaways and Lessons Learned

When I first started researching and combing through Confluence documents to learn more about API development in Criteo, I expected to find a fairly straightforward, linear progression of API services. I was surprised to find what is likely a pretty common evolution for a technology company that focuses on innovation and moving quickly.

Photo by Ben White on Unsplash

At the end of my research, I’ve come away with these key findings:

  • SOAP vs. REST — Although SOAP was very much à la mode in 2010 when Criteo began to take external APIs more seriously, it’s immutable nature makes it a very difficult candidate for an API that’s meant to progressively evolve. While it does offer several benefits (which admittedly I didn’t cover in this blog post), the REST protocol is critical to Criteo’s API’s current success.
  • Refactoring Costs — The introduction of PAPI brough significant overhead to the team in charge of managing Criteo API. During the past year alone, my team has been interrupted by failing PAPI tests as changes by developers outside our team break the contract between PAPI and MAPI. Therefore, it’s important to not only consider the immediate costs of big architectural changes, but also their ongoing, future impacting challenges.
  • Consistency is Key — Consistency is one of the core pillars that makes up a good API. From CPOP to MAPI, we were relying on people (developers, product managers, etc.) to enforce the consistency of the API. This approach was not sustainable, and resulted in different user experiences depending on which endpoints you called. It was clear that tooling, not people, should be responsible for enforcing a consistent API ecosystem.
  • Undocumented History is Hard to Retell — Maybe not surprisingly, it was difficult to research the evolution of APIs at Criteo. There was little documentation still available that described the earliest beginnings of Criteo’s external endpoints, and hardly any documentation left that covered the big migrations. The biggest trove of information was internal Criteos, so thank you to all the Criteo long timers who gave me invaluable insight to the context and impact of past design decisions.

Now that we’ve uncovered the past, I’m excited to turn my gaze to the present. I look forward to walking you through how we designed the current Criteo API ecosystem, and how developers are currently using it today to take advantage of Criteo’s advertising services.

Interested in joining us? Get in touch:

--

--