Rate Limiting Google Cloud Tasks

Published in

Bluecore Engineering

5 min readApr 6, 2020

Bluecore is responsible for sending large volumes of uniquely computed emails daily. In high-throughput systems, it’s important to share resources. When it comes to software systems, there is always some upper limit on the amount of external resources that can be used at any given time. Because software is in the cloud, it’s easy to imagine that there is an infinite amount of resources. In reality, there is a finite number of servers and disks and processors that can be used at any given time and so sharing these resources is critical to scalable architectures. That is why rate limiting is such a canonical problem in distributed systems. There are several different approaches to handling this kind of traffic flow.

We decided to build a Go service that would control the rate at which email deliveries take place. By making it an independent service, it would be easy to manage and extensible. The Go language provides a standard rate limiting package that uses a token bucket algorithm under the hood. This package is used internally by the rate limiter service. The beauty of this algorithm is that it is simple to understand and easy to implement.

The token bucket algorithm is based on a metaphor of a bucket filled with tokens (reference to the wiki can be found here). Each bucket has the following characteristics: the maximum number of tokens it can hold, amount of tokens currently available, and the rate at which tokens are refilled. Every time a request comes, a token is removed from the bucket. If there are no more tokens available in the bucket, then the request is rejected. Meanwhile, the bucket is refilled at a constant rate.

In the context of email delivery, we use this algorithm to generate timestamps for when emails can be delivered. If there are enough tokens in the bucket, the timestamp is set to the current time. If there are not enough tokens, a future timestamp is determined based on the refill rate. By generating a timestamp, we are able to have more flexibility in how we rate limit sends.

At Bluecore, we rely heavily on Google Cloud Task Queues and much of the pipeline can be thought of as a sequence of tasks. When a marketing campaign begins, there is a task to get all the emails associated with that campaign. More tasks are then created that are responsible for validating the send, generating email HTML, and actually delivering the email. A task also contains metadata that enables the cloud service provider to identify and manage a task accordingly. One of the metadata fields is called eta and it signals to the system when to start processing a task. The rate limiter service intelligently determines the eta for each task by generating a timestamp that directs the platform to deliver emails at the specified rate.

We interact with the rate limiter service by specifying a key (which is an internal identifier) and the number of emails that we want to rate limit. Every key has a configured rate limit, meaning x number of emails per second. When making a request to the rate limiter service, we can specify between 1 and x emails. The more emails that we specify, the fewer number of requests that are needed. By rate limiting in the task that retrieves emails, we have context on all the emails for a campaign and can send multiple emails per request. Compare this to enforcing rate limiting in the delivery email task. Each delivery task only has context on a single email. Using rate limiting in this scenario would mean sending the minimum number of emails (which is 1) per request. Strategically placing the rate limiting logic drastically reduces the load on the rate limiter service — this in turn decreases the amount of CPU, memory, and instances needed and further reduces cost.

This approach using timestamps and strategically placing the rate limiting logic was showing to lower costs and improve maintenance. The real challenge became apparent when we actually tried to rate limit really large sends. The figure below shows a send size of more than 150,000 emails. Because delivery times were predetermined, any latency in our system would cause a backlog of sends that would exceed the rate limit. The findings show that there is a large spike initially that converges to the configured rate over-time. We believe the spike is caused by a delay from ramping up resources on Google Cloud Platform (our cloud provider) to handle the large workload. It became apparent that trade-offs needed to be made between performance and control.

Sometimes the solutions to intractable problems don’t have to be overly complicated. Our workaround for this was done by artificially lower the rate limit slightly in order to account for these spikes. We configured the existing rate limit internally with a rate that is 30% lower. This was simple to implement and proved to honor rate limits. The results from the scaled-down rate limiter are shown below.

The trade-off with this approach is that it does lower throughput. However, that was a consequence we were willing to accept given our business and clients’ needs. Supporting workflows on external providers allows us to attract more clients. These third-party email providers, however, usually enforce a rate limit. Fortunately for us, clients on these providers oftentimes will transition into using Bluecore as their primary email service provider. When clients deliver emails through us, they are not rate limited, and as a result throughput is not affected. Rate limiting is both simple and complicated at the same time. It requires a certain level of finesse and is a recurring need in many software systems. Understanding trade-offs and handling edge cases are key to building a reliable and efficient rate limiting solution.

Rate Limiting Google Cloud Tasks

Written by Patricia Rozario