Rate Limiting Service Calls in Go

Kevin Hoffman
Sep 5, 2017 · 7 min read

We’ve all seen enough “hello world” samples to make our eyes bleed. Every time we pick up a new programming language, one of the first things many of us do these days is google up a sample of how to make a client call to a remote service. Whether it be an RPC call or a HTTP/REST call, figuring out how to get one service to talk to another service is a really common task.

Most of the samples for this service-to-service communication pattern are completely synchronous, or asynchronous with only limits imposed by the OS/environment. A scenario where this pattern could be utilized might be creating a dashboard service where one of the resources invokes a downstream weather service.

A synchronous HTTP request handler might look something like this:

func (h *requestHandler) handleWeatherRequest(req *WeatherRequest) (response *WeatherResponse) {
   response = h.callWeatherService(req)

This is a pretty big oversimplification, but the pattern should be familiar: expose one endpoint that consumes some other service before returning a value.

Now let’s assume that you’re paying to use this backing weather service and you’re on the ultra cheap plan — you can only make 1 HTTP request to the service per second. You’re only running 1 instance of your service so we don’t need to worry about distributed rate limiting (which is an entirely different can of demonic worms and a topic for another day).

But, you still have to worry about multiple simultaneous inbound calls to your one service. How do you ensure that even if you’re getting 100 requests per second for weather data, you only make 1 call per second to the overpriced weather service?

Thankfully Go makes this relatively easy with the use of channels. The pattern involves the use of a throttle channel that only ticks every time you’re allowed to process a single request. You then have a requests channel from which you pull dispatched work when allowed. Once you receive a request from the request channel, you then block on the throttle channel, waiting until you’re allowed to process a single item. In the case of this sample, processing an item involves making a call to the weather service.

Let’s take a look at how to set this up:

rate := time.Second // handle 1 tx per second
requestsCh := make(chan *WeatherRequest, 1000)
handler := &weatherHandler{rate: rate, requests: requestsCh}
go handler.processRequests()

The rate variable here controls the rate limit or the throttling rate. If we want to handle 2 transactions per second, we’d set it to time.Second / 2 , 10 TPS would be time.second / 10 , etc. We don’t want to set it to too fast a rate, otherwise it’ll be roughly indistinguishable from un-throttled usage.

The way this throttle channel implementation works is that for any given request, the consumer might have to wait up to rate (e.g. 500ms, 250ms) before the request processing even begins. If you have exorbitantly large forced wait time SLAs from your backing service in the minutes or hours, this isn’t really the ideal solution. I find this particular solution is well-suited to rates limited to transactions per second rather than seconds (or minutes) per transaction.

If we’re in a throttled environment, losing a few hundred ms to induced latency is probably not going to be a cause for alarm.

Next, we make a buffered channel so that we can store more requests than we’re processing. This allows a large number of requests/second to come into the service while only 1 TPS calls out to the weather service. If we exceed 1000 pending requests (the requests channel buffer size), then the individual HTTP requests to the service will simply block, waiting for processing to occur. Getting the balance right between the buffer size and the TPS fulfillment rate is usually an ugly dance only resolved with lots of experimentation.

Here’s the dispatcher function running as a goroutine that makes calls to the weather service after waiting for the throttle to elapse:

func (h *requestHandler) processRequests() {
  throttle := time.Tick(h.rate)
  for req := range h.requests {
    go h.fulfillRequest(req)

It’s worth noting here that the fulfillment of the request is dispatched asynchronously. At first glance it looks like this might allow requests to be serviced faster than the throttle rate. What it actually does is allow us to dispatch n transactions per second, even if some of those transactions take longer than rate to fulfill. In other words, we will be sending weather requests to the service at a fixed rate, but they will be coming back to us after a duration that might exceed the rate. If we’re servicing 2 weather requests per second, and sometimes the weather service takes 750ms to give us an answer, we don’t want to slow down the dispatcher for this.

So we’ve got asynchronous processing of weather requests, but how do we front that with a call that appears synchronous to our REST or RPC clients? The answer is again through the beauty of channels:

func (h *requestHandler) handleWebRequest(req *WeatherRequest) (resp *WeatherResponse) {
  c := make(chan *WeatherResponse)
  h.requests <- &WeatherRequest{ZipCode: req.ZipCode, respCh: c}
  defer close(c)  select {
    case r := <-c:
      resp = r
    case <-time.After(5 * time.Second): 
      resp = &WeatherResponse{Error: "timeout waiting"}

The service consumer will experience only a time delay equivalent to the weather service call plus latency plus whatever remains of the most recent tick if there are no other pending requests. If there are other pending requests, the consumer will wait while the request is satisfied.

In the preceding code, if the request waits more than 5 seconds then the attempt to make the weather service call will “time out” and we bail, returning a failure code to the consumer.

If you’re paying close attention, you may notice that we’re going to do something that’ll cause our process to panic. There’s a defer close(c) in the web request handler. This means that when this function goes out of scope due to a timeout, the channel will be closed and some indeterminate amount of time later, the work item processor will attempt to submit a reply on this closed channel.

Sending a value to a closed channel in Go panics — game over. So how do we tell our background request handler to not bother giving us a reply because no one is waiting for it? For good reason, there is no way to interrogate whether a channel is in the closed state. Doing so would just lead to more problems — the channel can close between the time you interrogate and the time you act, still panicking.

Context to the rescue! Let’s modify the web request handler to create a new context:

func (h *requestHandler) handleWebRequest(req *WeatherRequest) (resp *WeatherResponse) {
  c := make(chan *WeatherResponse)
  ctx, cancel := context.WithCancel(context.Background())
  defer close(c)
  h.requests <- &WeatherRequest{ZipCode: req.ZipCode, respCh: c, ctx: ctx}  select {
  case r:= <-c:
    resp = r
  case <-time.After(5 * time.Second):
    resp = &WeatherResponse{Error: "timeout"}

In this new version of the function, we’re calling cancel() to indicate that the context of this particular chain of function calls is done. In a real server process we might already have a context but for the sample, I’m creating one derived from context.Background().

cancel() doesn’t terminate any go routines, nor does it force our code to stop running. We need to explicitly check whether the context is still live. The code that fulfills individual requests by calling the weather service needs to be context-aware:

func (h *requestHandler) fulfillRequest(req *WeatherRequest) {
  r := rand.New(rand.NewSource(time.Now().UnixNano()))

  // randomly take longer than the timeout period
  chance := r.Intn(10)
  if chance > 8 {
   time.Sleep(time.Second * 6)
  }  resp = callService(...)  select {
    case <-req.ctx.Done():
      return // context aborted, don't send response
      req.respCh <- resp // context is alive, submit response

The Done() function on a context returns a channel. If it’s closed and we read from it, then we know that the context has been finished or aborted.

Now if I run this sample code that randomly fakes an extra-long call, I’ll see that I don’t panic and if I were to profile it I would see that I’m not leaking goroutines or channels — everything is neatly cleaning up after itself.

To recap, we’re limiting the rate at which a backing service can be called regardless of the rate at which new requests come into our REST service. We accomplish this with a throttle channel controlling the dispatch rate, a requests buffered channel that holds the incoming requests to the service, and a context which can be canceled to prevent our background calls from writing to a closed channel.

If you’re more familiar with the concept of promises than with channels, here’s an analogy: you can think of the concept of passing a channel to a function in order to get a reply in the future as being very similar to the concept behind promises and futures in other languages.

In a couple places in the above code I’ve created a new pointer to a struct when I could’ve re-used an existing reference. I did that to be explicit about where values are going and what information certain functions need — there’s plenty of room for optimization.

What strikes me about this solution is how simple and clean it is. In other languages that don’t have the concept of channels, orchestrating a flow like this would be annoying and error prone.

Happy throttling!

Kevin Hoffman

Written by

In relentless pursuit of elegant simplicity. Tinkerer, writer of tech, fantasy, and sci-fi. Converting napkin drawings into code for Capital One.