I’ve been doing some interesting work with the team at MFloW writing HTTP clients that consume financial data and it’s been eye-opening to see how different API platforms choose to protect their resources. Best practices for client side rate-limiting seems to be scarce when compared to sever-side on the web. So here are my thoughts on the subject and some code samples.
Understanding server-side rate-limiting
Most API endpoints implement resource consumption quotas in the form of rate limits. This is generally done either to protect their servers from being abused by too many requests or to monetize the endpoints for more frequent updates. For example, Yahoo Finance(RIP) enforces a rate limit of 2000/hr. Another platform IndependentReserve enforces a rate limit of 1/sec. If you pay close attention, these two limits are slightly different from each other in the way the quota is managed over the timeframe. In the case of Yahoo finance, you can consume the allocation of 2000 requests in 5 mins if you wish, which means bursty/high volume requests are allowed. In the case of IndependentReserve, you can call the Public APIs only once a second which I would assume is put in place to safeguard their servers from being abused by too many requests.
A quick detour from client-side rate limiting — I came across a lot of articles talking about ways to implement rate-limiting in the server application logic. I strongly discourage this since you will have to potentially replicate the logic in many different applications and this will become a code maintenance nightmare. Also, traffic control and shaping should be the function of a perimeter device so implement an API Gateway instead and offload this function to it. I’d recommend using a gateway such as Google Apigee or Kong
Some API platforms are kind enough to provide remaining request quota in the response HTTP header “Ratelimit-Remaining”. This will allow the consuming client to implement logic to play well with the resource server. From my experience, I’ve noticed these headers are treated as a courtesy and not necessarily a standard that everyone should follow. In most cases you will need a smart HTTP client so keep reading :)
Here is a sample RateLimit-Remaining HTTP response:
HTTP/1.1 200 Ok
When you exceed your rate limit you will generally receive an HTTP 429 Too Many Requests. At this stage you will need to wait for a bit before you fire off the next request and this is where the complexity arises. How long would you wait when an endpoint allows for bursty consumption of its resources over a large period of time?
What not to do
- Break down long timeframes to manageable ones — I’ve seen programmers take rate limit quotas such as Yahoo Finance API(2000/hr) and break it down into request per minute or second. In this case, it would translate to roughly 33 requests/min. This approach will work but will not allow the client application to make use of the bursty nature of Yahoo Finance API if required.
- Backoff on exhaustion — I detest this approach the most. Programmers use up the allocation without keeping track of the quota and when encountered with an HTTP 429 they implement an implicit sleep that work on an incremental exponential backoff approach. This approach should be avoided like the plague since debugging issues is more difficult and the resulting code is not deterministic. The problem gets further compounded when the HTTP client object is shared by multiple subroutines.
Repeat HTTP 429 offenders could be banned
Another problem with the second approach is it also assumes the API server will tolerate repeated requests upon receiving an HTTP 429. Some platforms such as Binance could temporarily or permanently ban a client when requests are still made after receiving an HTTP 429. This could happen if the incremental backoff algorithm doesn’t back off for long enough.
Client-side rate limiting — The duty of every HTTP client developer
Now that I’ve set some context let’s jump into the code and see how it’s all done. Golang’s standard library is extremely comprehensive and the answer to our problem lies in the package “golang.org/x/time/rate”, to be more precise it’s in the “type Limiter” implemented in this package.
A Limiter controls how frequently events are allowed to happen. It implements a “token bucket” of size b, initially full and refilled at rate r tokens per second. Informally, in any large enough time interval, the Limiter limits the rate to r tokens per second, with a maximum burst size of b events.
Here is a simple Go HTTP client to demonstrate rate limiting. Let’s make a few calls to BTCMarkets to get the Market Ticker data for Bitcoin (BTC-AUD)
The above code when executed should run without ever hitting the rate limit of 50 request every 10 seconds.To test it out try commenting out lines 20 to 24 and rerun the code.
In simple terms “Token Buckets” act just like a token dispenser you might have seen at the bank or any other place where you need to queue in a line with a token to be serviced. If the dispenser is empty you don’t get serviced. This is an effective method for crowd control and this approach is also used in packet-switched computer networks.
Using the token bucket implementation we can make sure our code doesn’t send a request if it doesn’t have a token and the token bucket is configured to be replenished at the rate the API server will accept requests.
The retrieval of a token is a blocking call so it ensures that no request is dispatched to the network unless a token is made available before calling *http.Client.Do(http.request)
Congratulations on making it to the end! You can now go be a good citizen of the internet who writes maintainable code that honors API rate-limits. To see a more comprehensive example take a look at https://github.com/MflowAU/btcmarkets