High-performance rate limiting

Yunjing Xu
Smyte Blog
Published in
6 min readOct 12, 2016

At Smyte we spend a lot of time stopping spam, scams, credit card fraud and online harassment. We’re open-sourcing one of the fundamental tools we use to stop malicious user behavior: our high-performance rate limiter.

What’s a rate limiter?

At a high level, a rate limiter limits the number of events an entity can perform in a certain time period. For example:

  • “A single IP can only create 20 accounts per day”.
  • “Devices are allowed 5 failed credit card transactions per day”.
  • “Users may only send 1 message per day with risky keywords”.

The industry standard algorithm for rate limiting is called a token bucket, sometimes called a “leaky bucket”. Each bucket has a string key and initially contains the maximum number of tokens. Every time an event occurs, you check if the bucket contains enough tokens and reduce the number of tokens in the bucket by the requested amount. After a period of time called the refill time, the number of tokens in the bucket is increased by the refill amount. Over time, these refills will fill up the bucket to the maximum number of tokens.

Token buckets with customizable refill amounts let you express more complex rate limits like:

  • “Each user account is allowed a maximum of 10 failed logins per hour, which refill at a rate of 1 login per hour”.
  • “Each geocoded shipping address is allowed a maximum of $200 worth of purchases per day from mail.ru email addresses, which refills at a rate of $50 per day”.

As you can see, these rate limits allow a lot of freedom for legitimate user behavior, but quickly clamp down on repeated violations.

The algorithm

Each bucket has a few properties.

  • key: a unique byte string that identifies the bucket
  • maxAmount: the maximum number of tokens the bucket can hold
  • refillTime: the amount of time between refills
  • refillAmount: the number of tokens that are added to the bucket during a refill
  • value: the current number of tokens in the bucket
  • lastUpdate: the last time the bucket was updated

There are two operations. get() simply returns the current value. get() isn’t actually used very often at Smyte, since we generally want to decrement the rate limiter and return its current value in one, low-latency operation. This operation is called reduce(tokens).

First, if the bucket doesn’t exist, we create it:

bucket.value = bucket.maxAmount
bucket.lastUpdate = now()

Then we refill the bucket if there are any pending refills:

refillCount = floor((now() - bucket.lastUpdate) / bucket.refillTime)bucket.value = min(
bucket.maxAmount,
bucket.value + refillCount * bucket.refillAmount
)
bucket.lastUpdate = min(
now(),
bucket.lastUpdate + refillCount * bucket.refillTime
)

Next, we check if tokens >= bucket.value and bail out if it is true.

Finally, we take the tokens out of the bucket:

bucket.value -= tokens

You can see a simple Python implementation in this gist. The example shows it’s very easy to implement this algorithm in memory, and can be quite cheap since we’re only storing two integers for each bucket. One problem with implementing this in memory is persistence. We may have very long-lived rate limits in our application, and we can’t afford to lose them if we take down or move a rate limit shard.

Given that the data fits in memory and that low latency and persistence are hard requirements, Redis seems like the natural choice for this application.

Why not Redis?

The main reason is that our team has a lot of experience scaling Redis at Instagram and at Smyte, and we’re keenly aware of all of the issues with scaling Redis. While Redis is great for quickly taking a prototype to production, it has a lot of downsides:

  • It’s hard to scan over subsets of the keyspace.
  • No built-in key compression. For applications that have many keys that share the same prefix and relatively small values (like our rate limiter), this results in much higher storage requirements.
  • Redis in production isn’t easy. We decided not to trust the new sentinel mode just yet, and the standard leader-follower replication is far from failsafe.
  • BGSAVE is problematic. It is required for leader-follower replication and uses the fork() syscall which then operates in copy-on-write memory. For write-heavy workloads such as rate-limiting, every write likely will cause that entire page to be copied for the BGSAVE process. This means the operation might not complete unless you have double the amount of memory available.

To run Redis safely we have a very complicated setup. Even with this setup, each piece of data is replicated twice in memory (which is expensive) and we have to run our boxes at < 80% memory utilization to avoid BGSAVE disasters.

For these reasons we decided to move most core services — including rate limiting — off of Redis. This is part of a larger effort to reduce the number of storage engines used at Smyte, and we’ve chosen RocksDB as our primary data store. We’ve migrated all stateful OLTP services to RocksDB with the exception of full-text search.

Our implementation

Our implementation is open-source on GitHub and implements the algorithm in C++ on top of RocksDB. We used Facebook’s wangle and folly to implement a multithreaded, pipelined server that speaks Redis’s simple yet fairly efficient protocol. To run the service we leveraged Kubernetes for service discovery and persistent volumes on Google Cloud persistent SSDs for storage.

This architecture gave us a lot of benefits for free:

  • RocksDB’s compaction filters let us expire data that hadn’t been recently accessed.
  • There is a lot of great tooling, tuning and knowledge around RocksDB which is reusable for all of our RocksDB-based services.
  • We didn’t need to keep all of our data in memory.
  • Kubernetes made operations about as easy as any of our stateless services.
  • Leveraging Redis protocol meant we didn’t need to change our clients and we have high-quality client implementations in every language we use.

We considered alternatives to Redis protocol including gRPC and Thrift. Our choice to not use alternative protocols was primarily motivated by the lack of mature client/servers available in Python & Node.js, as well as the complexity of implementing them ourselves. In the future we’re planning on migrating over to gRPC, and would accept patches that implement it without breaking the Redis protocol implementation.

One RocksDB feature we weren’t able to take advantage of was merge operators. This would normally be a great use case for merge operators, except that we needed to read our own writes to return a result from the reduce() operation. We ended up using a mutex pool to manage concurrency instead of merge operators.

Additionally, we stored the bucket parameters maxAmount, refillAmount and refillTime as part of the key so clients would not accidentally use different parameters for the same bucket. Since these parameters are often shared between many buckets, RocksDB’s key prefix compression made this effectively free.

Since our rate limiter speaks Redis protocol it’s very easy to use from the command line:

$ docker run -d -p 9049:9049 smyte/ratelimit
$ redis-cli -p 9049 RL.REDUCE TwoPerMin 2 60
(integer) 2
$ redis-cli -p 9049 RL.REDUCE TwoPerMin 2 60
(integer) 1
$ redis-cli -p 9049 RL.REDUCE TwoPerMin 2 60
(integer) 0

Our Extensions

The canonical token bucket algorithm solves most of our problems, but we also extended it to meet a few practical, production needs.

Strict mode

One extension we’ve made is the concept of strict rate limits. Often when a rate limit is triggered we’ll want to continue blocking traffic until the user stops sending traffic. Adding STRICT to the end of REDUCE commands will enable this behavior by updating lastUpdate even if the quota is empty.

Client timestamps

The default implementation use the current wall clock time to calculate when to refill. In practice, we want our rate limiter to work for both real-time data and historical data. To do that, the client can send the current time as a parameter, which means we can either current timestamp or timestamps extracted from historical data.

We hope that you find this rate limiter helpful, easier to operate, and more efficient than the alternatives. Be sure to check out the repo, and thanks for reading!

Photo credit Veri Ivanova
Interested in this stuff? Check out our site or our jobs repo.

--

--

Yunjing Xu
Smyte Blog

Smyte Engineering. Formerly at Square Data Science and Infra team and Ph.D. in distributed systems and security from the University of Michigan.