Keeping Our Services Healthy: Rate Limiting

Published in

CODE + CONTOUR by IPSY

9 min readFeb 15, 2023

When we are dealing with large-scale users, it’s essential to keep our services robust and ready to handle bursts of high user requests from time to time. At BFA we are often facing moments where user interest raises and our ecosystem is put to the test. User satisfaction is at the top of our priorities, so our architecture is designed to handle these sudden bursts seamlessly.

One of the techniques that we use to manage bursts of high user activity is Rate Limiting, which controls the amount of requests from clients for a predefined period of time. This ensures that a service’s load is closely watched, giving it more room to adjust to changes on this rate.

This can be easily done with Redis.

But… What Is Redis?

According to redis.io, Redis is an open source (BSD-licensed), in-memory data structure store used as a database, cache, message broker, and streaming engine.

It works as a NoSQL database, with a key-value approach, and the key can be of various types such as string, hashes, lists, sets, and sorted sets.

So, we can think about Redis as a Swiss-army knife, which is to say that we have many ways of using it. Not only can we employ it as a regular, old-fashioned cache, but we can also implement queues, session management, streams and, of course, a Rate Limiter.

Hands-on

We created an example service that you can clone from this repository and follow along. It’s a simple Spring Boot MVC application that starts an embedded Tomcat on port 8080. To show the Limiter functionalities, we’ll run a load test with limited memory available for the service (spoiler alert: we expect it to crash 😄).

Let’s start with a single Controller with an unprotected GET endpoint, which just waits for 10 seconds before returning the String “OK” and a 200 status (as shown below):

@RequestMapping(value = "/", method = RequestMethod.GET)
public ResponseEntity<String> myEndpoint() {
   try { 
      Thread.sleep(10000); 
   } catch(InterruptedException e){}

   return ResponseEntity.ok("OK!");
}

If we call the “/” endpoint from the browser, we can see that the request waits a little over 10 seconds and returns correctly. We can call it as many times as we want without error. But what if we have 50 users calling it simultaneously?

*Graph A: 50 requests performed and all received 200 as the response status.*

No problems so far, so let’s complicate the scenario a bit: Now, we’ll have 10 times more users, each of whom are continuously requesting more data for five minutes straight.

*Graph B: 500 requests performed with some successful status, along with some timeout, reset connection, and broken pipe responses.*

Things have gotten a little complicated, right? If we have a controlled scenario like an internal client, we could use strategies like Circuit Breakers to give the service more time to recover, but what if the requests are coming from the internet? Maybe we have an Open API and a third-party service as the client. We cannot expect every client to always maintain a healthy relationship with our APIs — and that’s where the Rate Limiter comes in.

Rate Limiting is a pretty common standard and most companies implement it for open APIs. As an example, here’s Facebook’s documentation for its GraphAPI: https://developers.facebook.com/docs/graph-api/overview/rate-limiting/

To enable Redis, we have some boilerplate code that we’re not going to bother you with (though you can find it on the example application). After everything is set up, we can update our Controller’s logic:

@RequestMapping(value = "/", method = RequestMethod.GET)
public ResponseEntity<String> myEndpoint(HttpServletRequest request) {
 // 1)  First, let's check if the service is already limited (throttled)
   if (rateLimitService.isThrottled(request.getRemoteAddr())) {
   // If so, we'll return a HTTP status 429, which is used for rate limit
      return ResponseEntity
          .status(HttpStatus.TOO_MANY_REQUESTS)
          .header("x-rate-limit-reset",
          rateLimitService.getExpireInSeconds(request.getRemoteAddr())
          .toString())
          .build();
    } else {
   // 2)  We are not limited, so we just increment the amount of requests
      rateLimitService.incrementAndThrottle(request.getRemoteAddr());
   // And then we can run our endpoint's actual logic
      try { 
          Thread.sleep(10000); 
      } catch(InterruptedException e){}   
      return ResponseEntity.ok("OK!");
   }
}

There are some interesting things to note here! Let’s break down our logic into two steps:

First, we are validating if the user is already flagged as ‘limited’ (also known as throttled). We do this by capturing their IP address from the request (the getRemoteAddr() function) and pass it to another class that will do the actual validation. If the user is flagged, we’ll just return the HTTP status 429, which is designed exactly for this kind of scenario. It’s also a common practice to return a header indicating how many seconds until your client is allowed to send requests again. This is not a standard from W3C, so we can call our header whatever we want — we are using a name commonly used by services such as GitHub: “x-rate-limit-reset”.

HTTP Status 429 is used to tell clients that they have exceeded their request quota. Together with headers such as “x-rate-limit-reset”, the client can create a logic to find out exactly how long they have to wait until they may have a successful response.

If our user is not yet limited (or throttled), we can proceed with our logic. Since we need to ensure that all requests are accounted for, we decided to add this as the first instruction (this could also be done on a “finally” block, as this placement is not an actual requirement). Before we show the actual code, let’s talk a little about our limit logic and Redis:

To limit users, we need to store a counter for each request coming from them. This counter may use any identifier and it doesn’t even need to be unique — we could use application tokens to limit requests for all its users, or apply user tokens to limit requests from any device that a user has — it all depends on your requirements. For this article, we have chosen the user’s IP address.

Let’s say that each time we receive a request, we would need to check a map in memory to see if the user is limited. If they are not, we then increment the amount of requests, and when they reach a threshold (for example: 10 requests in one hour), we can block them from reaching our endpoint’s logic again. The logic seems valid, but would it work? The answer, unfortunately, is no. First, we are storing counters in memory. Not only could our service crash if we had too many items on that map, we’re probably not running a single instance. That means that each of them will have their own counter and our users will have “number of instances * request limit” more chances to send requests. We would also need a way to “unblock” users when they reach the lock period. This is where Redis comes in.

How Redis Works

Since Redis is an in-memory database, it’s blazing-fast with a query performance comparable to a “get” from a Hash dictionary in terms of complexity (they do use similar algorithms, but for now we’ll focus on its application in rate limiting). By using Redis as a separate service, we can treat it as a “shared memory” and all counters will be stored in the same place.

Warning: It’s important to note that we have exchanged issues here. We solved the shared memory problem but in detriment of the response time, since as fast as Redis is, it cannot resolve latency problems. Always keep the fallacies of distributed programming in mind!

Single read-and-write operations (such as GET and INCR) on Redis are atomic, which means that you don’t have to worry about a race condition on them. But to implement the “unblock” feature, we’ll need to also use a second operation. We need the EXPIRE command, which will set a Time To Live (TTL) on that Redis key, causing it to be automatically deleted once the time expires. Since we are now running multiple commands, we need to enclose them on a single transaction, to ensure that they run as a unity. Let’s look at the code:

public void incrementAndThrottle(String originIP) {
    // The MULTI command starts a transaction
    redisTemplate.multi();

     // The INCR command increments a counter and returns its updated value
     Long currentValue =             redisTemplate.opsForValue().increment(originIP);

    if (currentValue == 1) {
       // The EXPIRE command sets a TTL, here we are setting it to 1 hour
        redisTemplate.expire(originIP, Duration.ofHours(1));
     }

    // And finally, the EXEC command commits the whole operation
    redisTemplate.exec();
}

We treat it much like a SQL transaction — with the MULTI command we start a new transaction, then we set the operations that we want (here, the INCR and EXPIRE commands), and finally commit the transaction with EXEC. We only want to set the TTL once, so we only call it when the counter has the value of 1 — if we call it all the time, we would be merely resetting the TTL to one hour, keeping our users blocked eternally.

One addendum: A transaction’s operations must run on the same connection. With some Java drivers there’s a possibility that Spring’s RedisTemplate tries to run the MULTI and EXEC commands on different connections, which would throw an Exception with the message “Err: EXEC without MULTI”. If that is the case, you can encapsulate your code on a callback which will instruct RedisTemplate to run on the same connection. Here’s the modified code for this:

public void incrementAndThrottle(String originIP) {
 redisTemplate.execute((RedisCallback<String>) connection -> {
      // The WATCH command keeps track of changes on a key
      connection.watch(originIP.getBytes());
      // The MULTI command starts a transaction
      connection.multi();
      // Write operations
      Long currentValue = redisTemplate.opsForValue().increment(originIP);
     if (currentValue == 1) {
       // The EXPIRE command sets a TTL, here we are setting it to 1 hour
       redisTemplate.expire(originIP, Duration.ofHours(1));
     }
     // And finally, the EXEC command commits the whole operation
     connection.exec();
     return null;
   }
 });
}

With all this in place, we can run our load-test scenario again. First, we need to wait until all clients are unlocked (which should happen automatically after one hour). Or you can access your redis instance using its CLI “redis-cli” and run the command FLUSHALL to delete all keys on it (WARNING: this is a destructive operation and there’s no rollback — only do it if you know what you’re doing).

Let’s run the script again, and here are the results:

Graph C: After 10 requests performed, which returned 200 as response status, the user started to receive 429.

The graph shows that, after 10 successful requests (the ones that got the status 200 and the message “Ok!”), our clients started receiving errors: All requests returned 429 until the script finished. This is expected, since we block our users for one hour and the script only waits for 10 minutes. We still have one more piece of information to show you:

Graph D: Latency kept as 10012 milliseconds when having success, but dropped to approximately 5 milliseconds when throttled.

Cool, right? We can see that each client’s first 10 requests take about 10 seconds to return — which is expected, since we are waiting about this time to answer — and after that, all requests have a near-instant response. This shows that response time for our query on Redis (the one where we check if the user is blocked) is so fast that it doesn’t hurt our response time too much (again, don’t forget the fallacies of distributed programming 😄).

Rate Limiting is just one of many techniques that you can use to keep your services healthy. In System Architecture there’s hardly a one-size-fits-all solution, so it’s important that as an Engineer you keep track of some possibilities. Redis is a really powerful tool, and if you look into it further, you’ll see that several techniques can be implemented with its aid.

At BFA, we are always looking for improvements to better serve our users, and it’s always good to share knowledge and solutions that we find. This is a small sample from our challenges and we hope you find it useful!

PS: All tests were run with jMeter in a Macbook Pro with M2 Max chip and 32gb of RAM.Everything was running locally: application, redis and jMeter.

Keeping Our Services Healthy: Rate Limiting

But… What Is Redis?

Hands-on

How Redis Works

Written by César Lawall