Building a rate limiter capable of handling hundreds of thousands of requests per second

Published in

AppsFlyer Engineering

4 min readJun 23, 2021

In the beginning of May, AppsFlyer’s Protect360 introduced “Click blocking threshold” — our anti-fraud solution that blocks non-human generated clicks, by limiting the number of clicks each ad network is allowed to send per day. Being able to limit the number of clicks at the entry point to AppsFlyer has many advantages: protecting our customers from click flooding, helping our customers optimize their campaign correctly and reducing operational costs for AppsFlyer.

In a nutshell, the system has three parts:

Offline job that calculates the daily click limit per ad network
Service that filters all the clicks in real time, handling ~450K events per second, and counting and marking clicks that pass the daily threshold. Valid clicks are sent via Kafka topic to AppsFlyer’s Real-Time-Attribution system and storage, and blocked clicks are sent via another Kafka topic to a short retention storage
Management service that monitors the counters and limits, manages a block-list of ad networks and sends alerts

Business Background

AppsFlyer is an MMP (Mobile Measurement Partner) providing mobile ad attribution. Each day our services:

Receive tens of billions of ad clicks and impressions from our ad network partners
Receive hundreds of millions of installs from our advertising customers
Match them in near real-time to attribute the correct ad spend to the user acquisition

Fraudsters sometimes attempt to flood our systems with many millions of clicks. These clicks gain ridiculously low conversion rates — meaning less than one of several thousand clicks gets attribution, and in most cases our anti-fraud algorithms mark this attribution as fraud.
Yet still some of these fraudulent clicks get past our anti-fraud algorithms and succeed in securing attribution money for the fraudster. In addition, these fake clicks can cost our company expensive computing and storage resources.

In order to control this, we decided to limit the number of clicks each ad network is allowed to send every day. The limit depends on the history of legitimate installs the ad network has generated and a reasonable conversion rate.

The database

To achieve this objective we needed a super-fast, in-memory DB to centrally hold the counters. So we chose AWS ElastiCache for Redis as our backend.

The DB holds the following data:

Block-list — Redis SET holds list of keys to block (key is ad network ID)
Daily Counters — HASH with name counters-yyyy-MM-dd in format: {key:counter}
Limits — holds calculated daily limits of all ad networks, HASH in format {key: limit}

Click capping calculator job

All our raw data, including clicks and installs, is kept in a data lake based on Parquet files in AWS S3 buckets. We developed an hourly SPARK job that looks at the historic performance of each ad network, and calculates the daily quota of clicks. The calculation is dynamic and runs on an hourly basis to take changing trends into consideration.

The job calls a REST API on the click-capping management service, in order to persist the newly calculated limits in Redis.

Click capping management service

The management service is responsible for several tasks:

Exposing a RESTful API, used by the click-capping-calculator hourly job to set limits for ad networks
Every minute a scheduler does the following:
Reads the limits and counters from Redis
Sends alerts by email and Slack to the relevant stakeholders ,when the ad network passes 70% of its daily limit of clicks and we start to block their clicks
Updates block-list in Redis, with the list of ad networks that have passed their daily limit

Click capping rate limiter

This is the heart of the system, written in GO, running on about twenty c5.2xlarge ec2 spot instances and handling around 450k requests per second (up to 15k requests per second per server) with an average response time of 0.3 milliseconds.

We decided to develop the rate-limiter as a standalone library that might be used in other cases in AppsFlyer. Let’s look at how the rate limiter works:

The rate limiter holds an in-memory cache of the block list. A scheduled timer updates the list once a minute from Redis.
Our rate limiter can be lenient, since we acknowledge that several (thousands) clicks may pass the daily quota.

The main function of the rate limiter is to get the ID of the ad network (i.e. “key”) and do the following:

Check if the key exists in the cached block-list — if so, the click should be blocked
Write the key to an in-memory channel
Return the answer immediately

A Goroutine consumes the messages from the in-memory channel and updates the in-memory counters.

An internal scheduler flushes the in-memory counters to the Redis daily-counters every 30 seconds and resets them. We chose to use Redis pipelining with HINCRBY commands to achieve top performance — it takes about one millisecond to flush a few thousand counters to Redis.

Another scheduler refreshes the cached block-list from Redis every minute.

Wrapping up

After a month in production, the click-capping system proved to be a huge success. During this time we saw a significant decrease in click flooding, and the number of clicks that reached our Real Time Attribution engine was reduced from almost 25 billion per day to around 15 billion, without reducing the number of valid installs or losing attribution.

In parallel, we saw an increase in impressions (ad views) and we suspect that some fraudsters shifted their tactics from click flooding to impression flooding… Coming soon — an impression-capping system based on our rate limiter…