Takeaways of building a business-critical low-latency microservice at scale
This article describes how and why microservices have been introduced at Teads and outlines a concrete business feature we implemented this way.
While this is not a complete guide, you will find helpful information if your company has reached the limits of a monolithic architecture, both from an organizational and a technical point of view.
Teads’ ad server history: a monolith
First, a few words about Teads’ ad server. Teads connects the best publishers in the world with the most premium advertisers in order to monetize their inventory. Teads’ real-time marketplace is powered by a back-end service that we will call the ad server. When a user visits one of Teads’ publishers pages, our ad server receives an ad request. An auction is then run on Teads’ side to find a campaign to deliver and provide the best ad experience to the user.
We’ve come a long way since we built Teads’ ad server ten years ago. It was originally a single service living in a GitHub repository with a component-pizza-team delivering features at the speed of light. This is what Teads, a promising ad-tech startup, needed at that time.
Then, Teads’ engineering team grew, and the ad server team too. We switched to a “feature teams” organization model, with an increasing number of engineers contributing to the ad server codebase. Later, when two pizzas were not enough to feed our team anymore - as hungry developers we are -, we split our ad server team into lanes, which themselves became feature teams.
Nowadays, there are about ten teams who can potentially implement features on a single service, on a single GitHub repository. The limits have been reached, both in terms of agility and technology:
- Development: With hundreds of thousands of lines of code, loading and compiling the project from scratch is taking multiple minutes, and running the tests takes even longer. Even if the code is well modularized, there are still co-owned parts and potentially multiple reviewers from various teams.
- Deployment: with dozens of engineers working more or less closely on the ad server service, we sometimes need to manage several multi-commit deployments a day. And the bigger the service is, the harder it is to deploy it.
- Performance, observability, optimizations: finding a bottleneck in a monolith is like finding a needle in a haystack. We target no more than 50–60% of CPU usage on our ad server instances to ensure their stability, which leaves some room for improvement.
- Complexity, lack of flexibility and modularity, weak fault isolation, poor resilience and difficulty of testing are many other issues that we are facing, but there are already articles out there describing the drawbacks of monoliths, so we won’t cover that exhaustively here.
Of course, microservices also come with their own set of pitfalls and we will later detail which of these we encountered and how we mitigated them.
First microservices and the introduction of gRPC
A few years ago, we started to extract some business logic from the ad server codebase and created a few microservices using HTTP + Protobuf.
As our ad server was already parallelizing most of the business logic, the latency induced by extracting it outside of the monolith was easily mitigated. These first microservices proved to be very useful in solving the agility and technological issues described above.
The long-term goal became clear: In order to improve the platform, only the core ad server features should remain in the main service.
In the meantime, another team created Teads’ first gRPC back-office API and invested in some tooling to make the protocol adoption easier for any new service.
It was widely used and proved well: it was the right time to generalize the usage of gRPC for all internal server-to-server communication at Teads.
For the high-scale low-latency an ad server needs, the promises of this new protocol were very attractive. In order to avoid feeding the codebase with new complex processes, we started to ask ourselves, for every new feature, whether it should sit within the monolith or outside of it.
That’s the question we raised once we started working on a new project to improve the ad server ad delivery accuracy.
A critical business requirement: fulfill our clients’ needs
Advertisers have requirements that must be fulfilled when delivering their ads on our platform. Some of them want their own dataset to be used to ensure their ads do not appear on specific pages or content. That can represent a significant number of elements, sometimes over a million, that we have to systematically verify as we don’t tolerate a single ad being delivered off the mark.
While various tools in place allowed a high level of conformity with these requirements, some anomalies still occurred. We needed to ensure that the advertisers’ needs were met 100% of the time, so we started working on a new feature, let’s call it Fulfill Advertisers’ Requirements (FAR for short).
Microservices and high SLO: a complex match
So we have a new well-scoped business case materialized by millions of data points and a hard constraint to avoid ads being delivered off the mark. In order to avoid feeding the monolith with this potentially resource-intensive feature, we created a new microservice to answer this need.
The FAR goal is simple. For every ad request the ad server receives, it will ask the FAR microservice for the list of advertisers that disallow the context of this ad request.
Adding the FAR feature in a microservice brings a limitation. A microservice is an independent component that comes with its own SLA. Furthermore, it is being reached via the network which itself brings some unreliability.
While the overall SLO will be very high, >99,99%, it is still not 100%. When the ad server is up and delivering ads, but the FAR microservice is down, we need a fallback mechanism to avoid delivering off the mark.
In technical terms, we call it a false negative when the system says the context is allowed by this advertiser, but in fact, it was not. To avoid false negatives, we store directly in the ad server via an in-memory cache the short list of all the advertisers that have a registered FAR requirement. This way, when the FAR microservice is not available (rare, but it happens), we just safely block all the sensitive ads.
However, relying on this fallback potentially brings false positives: we can block an ad when we shouldn’t have blocked it. This follows one of Teads’ guidelines: we prefer to miss potential delivery rather than delivering off the mark.
We decided to retrieve this fallback from the FAR microservice itself for separation of concern reasons: only this service owns the business logic, thus, it must be the only one with access to the data source.
The FAR microservice is prone to failure, that’s why we designed a fallback. But the fallback is only retrievable via the FAR microservice: isn’t this a snake biting its own tail?
No, the main FAR microservice route is poorly cached as the cardinality of the requirement data points is high. Thus, we need to call it very often.
However, the fallback route is fully cacheable, so we only need to call it once every few minutes. This way, when failures happen, we use the cached fallback. During the ad server boot time, before receiving any request, we try to load this fallback with retries in case of failure. If the failure persists despite the retries, the ad server still boots in degraded mode. As it is missing the FAR feature, it can deliver ads off-the-mark, potentially impacting our SLO.
To mitigate this issue as soon as possible, our 24/7 on-duty team is immediately required to intervene.
Takeaways from implementing our first gRPC microservice
On paper, many successful companies have moved from a monolith to microservices, and multiple great articles exist on this topic. But what happened to us in real life?
Before creating a new service, we must configure all the tools and infrastructure the service needs to be built, run, and operated: a git repository, the CI/CD pipeline, the instances, the network, the load balancer, the autoscaling policies, the alarms, the monitoring dashboards, the logs infrastructure… That’s many things!
Being well-tooled at the company and infrastructure level is very important as it will help to bootstrap, run and operate microservices with ease, avoiding cumbersome and dangerous copy/pasting. To overcome this, tools such as “infrastructure as code” and container orchestration systems are helpful.
Cloud provider readiness
At Teads, we are running on AWS. Our internal back-end to back-end service communications used HTTP before. But AWS only supports HTTPS when using the gRPC protocol. While it wasn’t an issue for us to switch to HTTPS, it’s still very important to check the cloud provider’s readiness before committing to such a project.
When properly used and sized, caches can be a game changer when it comes to achieving good performances and scaling your business. If possible, use a cache before each service call. In our case, we’ll add:
- a cache in the ad server before calling the FAR microservice
- a cache in the FAR microservice before calling our database
Before using a cache, there are some questions to answer:
- Are the calls idempotent?
- Is eventual consistency acceptable?
- How many different keys are there?
These will help you define if caching is possible and choose the best caching strategy and sizing for your case.
We achieve a 70% cache hit rate in the ad server service and a 99.9% cache hit rate in the FAR microservice. In the end, the database is only called on the critical path a few times every ten thousand requests in the ad server: the cache allows the microservice to put up with a high workload without suffering high latencies or compromising the database.
Why do these hit rates differ?
First, the FAR microservice takes a composite key as a request payload and applies some additional business logic. When querying its database, it uses a single key that has a lower cardinality. Also, as the cache lives inside a dedicated microservice we can make it larger, it has fewer memory contraints compared to being in a monolith. In our case, it caches almost all the database calls.
Ultimately, the performance penalty of using a micro-service is limited thanks to these caches.
As each microservice comes with its own instance and environment, running such an architecture can lead to the usage of a lot of infrastructure resources. While this is true, microservices also allow us to make better use of them.
To effectively size instances for your application, there are some questions to answer: Is it CPU intensive? What is its memory usage pattern? Is it network-demanding?
After some testing, we settled on using 4vCPUs 8 GB instances. If you are using a cloud provider, follow its best practice for service availability. In our case, as we run AWS spot instances we selected multiple highly available instance types corresponding to our resource usage and tested them to ensure the correct execution of our application.
In addition to sizing your instances, correctly sizing their pool is crucial:
- A failing instance will have a higher impact on your platform if you are running too few of them (spot instance reclaims will have the same effect).
- In addition to higher infrastructure cost and energy consumption, having too many instances will increase the resource usage dedicated only to running the environment of the application.
While the does-it-all ad server was capped at around 50/60% CPU usage, the FAR microservice can run at 80%+ CPU usage without any negative performance impact. We decided to go for a 75% CPU usage autoscaling threshold to leave some margin in case of a sudden increase in load.
Performance & latency
For the ad experience to be optimal, the complete ad server process needs to be done in the order of a tenth of a second. Thus, when no local-cached data is available for the microservices gravitating around it, their latency must be, in turn, minimal.
For this reason, even if the database receives a lower workload thanks to multiple caching layers, it is critical to still choose a performing one. In our case, we settled on Redis, which achieves 1ms p99 latency.
It is important to be mindful of the overhead brought by your communication protocol and frameworks. As an example, we are using akka-http as our server software. Its default behavior is to treat incoming gRPC requests as a stream. In our case, as our payloads are small, this was not useful. By enforcing eager request processing, we were able to significantly improve the service latency.
To enable the behavior described above, we needed to update the default
0value of the
See Akka’s article and documentation on this topic.
More generally, make sure not to blindly reuse the settings you’ve copied and pasted from an existing service, but instead, challenge them.
From the ad server’s perspective, we achieve a p50 latency around the millisecond, and a p99 below 5ms, which is acceptable for our needs.
At peak traffic, and with 5 of the instances described in the previous paragraph, we are able to process over a million requests per minute coming from hundreds of ad server instances.
During our CI pipeline, integration tests are run on the ad server codebase. For this purpose, a stack with all its dependencies is lifted, which includes the various microservices it is linked to.
A simple way to handle this would be to lift the actual microservices to ensure that the test behavior mirrors the production. However, this will quickly lead to too many components to be lifted as the microservices themselves depend on other services or middleware. Also, it would introduce a dependency between the service we want to test and its downstream services, which doesn’t fit with the separation of concerns principle.
As the FAR microservice codebase already comes with a test suite ensuring its non-regression, it’s not necessary to lift the actual microservice during the ad server integration tests to validate its behavior. Instead, we can mock the microservice, greatly increasing the performance of the test stack and reducing dependencies.
What ensures that the contract between the ad server and the FAR microservice remains valid?
While testing the behavior can be left to the service’s test suite, mocking the microservice in the integration tests that depends on it comes at another price: the contract between the two components is not validated during these tests anymore. Contract testing aims to solve this issue.
From a business perspective, the introduction of the feature ensured 100% fulfillment of the advertisers’ requirements handled by the FAR microservice. After a year of continuous running, no delivery anomaly has occurred.
On the technical side, the feature was implemented without a negative impact on the performance of our platform. Having it running as a standalone service allowed us to quickly iterate and detect potential issues.
While the boilerplate needed for this microservice was quite heavy, it has served as a blueprint for subsequent microservices built by the team. Multiple new components have been built this way since then, and they can now be bootstrapped in a couple of days.
As our team grew, we decided to transfer the ownership of the FAR topic to a different team. Having introduced a strong separation of concerns, both on the codebase and infrastructure resources, between the ad server and the FAR microservice allowed this change to happen in a matter of a few lines of code, without any production change.
We are now tackling more in-depth topics such as distributed tracing and streamlining A/B testing over our fleet of services. Our journey into the microservice realm is only beginning!
Thanks to all of our reviewers from Teads’ innovation team!
If you are interested in building software at scale and are looking for a new challenge, have a look at our engineering job opportunities.
By Timothy Cabaret and Thomas Mouron