Go | Rate Limiting
Rate Limiting allows you to place constraints on the frequency of events that can occur in a defined amount of time. If you explained this to your end user, they probably would not want this because they expect something to happen immediately when requested. However, if you are working with an API for a service, you may have somewhere in the system that needs to perform some database calls, disk reads/writes or make a network call to a legacy system which can take time. Your end user would much prefer a task to complete successfully, rather than returning an error because your system could not cope with the number of requests all at once.
With rate limits you can understand the performance and stability of your system so you know what to expect and how to expand the constraints in a controlled manner. There is also a security aspect to consider, that you don’t want to leave your system vulnerable to a malicious user that can access your system as fast as their resources allow it.
To understand rate limiting further, firstly I am going to demonstrate by creating an application that accesses an API without any rate limiting. I’m going to keep the API very simple and expose two simple functions.
The application to access the API is below.
Running this application now, I can see from the console that the API requests are executed simultaneously and the end user can access the system as frequently as they want.
Implementing rate limits in Go can be done by using the golang.org/x/time/rate which uses an algorithm called the token bucket. The theory behind the token bucket is that you need an access token to be able to utilize a resource. Without a token the request will be denied. Tokens are stored in a bucket and can hold a finite number of tokens. Every time you access a resource, you allocate an access token until you have no more tokens in the bucket. At this point then, any requests will either have to be queued or denied. The tokens are added back to the bucket at a specified rate.
The main function I will use to interact with the rate package is the NewLimiter function which takes two parameters, the first parameter allows events up to the rate r and the second parameter is the bucket depth as discussed before. I will modify the Open function so that I set a rate limit of one event per second.
I will then modify the functions the API offers so that it calls the Wait function on the rate limiter. This ensures that we have enough access tokens for our API to complete the request.
If I run the application again, I can see that all our requests are still simultaneous but now we are completing them one second at a time.
Now we have our simple rate limiter working, let us consider a more complex requirement we may have. We may want to define limits based on a particular resource in the system or within multiple different time frames. For example, we may want to control the number of requests our API will process within a minute but within this minute we have controls to limit by the second to stop the system from being overloaded within requests at once.
To achieve this, I will create a simple aggregate rate limiter so that we can define multiple time based limits for each resource and aggregate this into one limiter that will handle the functionality for us.
Firstly, I am going to introduce a new interface and struct objects that will allow me to create MultiLimiter instance with multiple limiters.
I will then create the functions that implement the interface I have defined. The Wait function iterates through all the limiters and calls Wait on each of them, some of which may block but we need to notify each rate limiter of each request that executes so that we can decrement the token bucket for that limiter. The MultiLimiter function sorts the rate limiters and calls the Limit function which returns the most restrictive limit which will ensure that when we wait, we will wait for the longest wait.
When I execute the code now, I get the following output.
Notice from the console output that we log the read message every five seconds as defined by the limit set on the dbLimit. The apiLimit has a limit to five requests every minute and within that minute it is restricted to a maximum of two requests per second, so from the console output we can see that the first five messages we acquire the five access token defined by the minute on the apiLimit, but these are then further limited to a maximum of two requests per second within that minute.
The sixth request of the Resolve function occurs twelve seconds after the first request, which may seem incorrect as we should only be allowed to process five requests per minute. But I would like to consider the limit of the apiLimit over a sliding window of time, which is why I have created the Per function. So by the time the sixth request we have occurred another access token back to the bucket from the first request.
I hope you found this post useful. Thanks for reading.