Go: Avoid duplicate requests with sync/singleflight

Kale Blankenship
Oct 11, 2016 · 9 min read

Have you ever had the situation where multiple requests might come in to your application for the same resource? And did that resource need to call and expensive function (read a file, access the network, perform a calculation) to process the request? If you said “yes”, sync/singleflight is here to help.

singleflight isn’t a new per-se. The logic has been in the Go standard library for quite a while, but as an internal package. (For the uninitiated, you can’t import from the internal directory of an external package.) It’s been copied to various repositories, but these weren’t kept in sync with the standard lib version and doesn’t give you the warm fuzzies of importing a golang.org/x package.

Enough talk, let’s get to a concrete example. The most obvious use might be to ensure that multiple DNS lookups for the same domain name don’t cause duplicate simultaneous network requests. I’m not going to show that though, because the standard library already does it; it’s why singleflight was created. Instead, we’ll look at a contrived example of an HTTP server that needs to make external API requests.

In the example, you can see that we have a simple handler which makes a request to GitHub and returns the API status to our client as a text string.

Now we’ll make a few simultaneous requests and look at what happens. There are many ways to do this but I’m going to use vegeta, a HTTP load testing tool written in Go.

# echo “GET http://localhost:8080/github" | vegeta attack -duration=1s -rate=10 | vegeta report
Requests [total, rate] 10, 11.11
Duration [total, attack, wait] 2.011072002s, 899.999ms, 1.111073002s
Latencies [mean, 50, 95, 99, max] 1.269280654s, 1.222384648s, 1.445123078s, 1.445123078s, 1.526538858s
Bytes In [total, mean] 210, 21.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:10
Error Set:

This command instructs vegeta to send 10 GET requests per second to our endpoint for 1 second.

From the server logs we can see that the githubStatus() function was called 10 times, once for each status request.

# go run e1_without_singleflight.go
2016/10/11 12:36:30 Making request to GitHub API
2016/10/11 12:36:30 Making request to GitHub API
2016/10/11 12:36:30 Making request to GitHub API
2016/10/11 12:36:30 Making request to GitHub API
2016/10/11 12:36:30 Making request to GitHub API
2016/10/11 12:36:30 Making request to GitHub API
2016/10/11 12:36:30 Making request to GitHub API
2016/10/11 12:36:30 Making request to GitHub API
2016/10/11 12:36:30 Making request to GitHub API
2016/10/11 12:36:31 Making request to GitHub API
2016/10/11 12:36:31 Request to GitHub API Complete
2016/10/11 12:36:31 /github handler requst: status “good”
2016/10/11 12:36:31 Request to GitHub API Complete
2016/10/11 12:36:31 /github handler requst: status “good”
2016/10/11 12:36:31 Request to GitHub API Complete
2016/10/11 12:36:31 /github handler requst: status “good”
2016/10/11 12:36:31 Request to GitHub API Complete
2016/10/11 12:36:31 /github handler requst: status “good”
2016/10/11 12:36:31 Request to GitHub API Complete
2016/10/11 12:36:31 /github handler requst: status “good”
2016/10/11 12:36:31 Request to GitHub API Complete
2016/10/11 12:36:31 /github handler requst: status “good”
2016/10/11 12:36:31 Request to GitHub API Complete
2016/10/11 12:36:31 /github handler requst: status “good”
2016/10/11 12:36:32 Request to GitHub API Complete
2016/10/11 12:36:32 /github handler requst: status “good”
2016/10/11 12:36:32 Request to GitHub API Complete
2016/10/11 12:36:32 /github handler requst: status “good”
2016/10/11 12:36:32 Request to GitHub API Complete
2016/10/11 12:36:32 /github handler requst: status “good”

Let’s think about what’s happening here. 10 requests come in, roughly simultaneously, and we then make 10 separate HTTP requests for the same information. This means our application has to work harder, we use more bandwidth, and we’re putting unnecessary load on GitHub’s API. Bad times all around.

Let’s see what singleflight can do for us.

And the results after hitting our endpoint with vegeta?

# go run e2_with_singleflight.go
2016/10/11 13:02:49 Making request to GitHub API
2016/10/11 13:02:51 Request to GitHub API Complete
2016/10/11 13:02:51 /github handler requst: status “good”, shared result true
2016/10/11 13:02:51 /github handler requst: status “good”, shared result true
2016/10/11 13:02:51 /github handler requst: status “good”, shared result true
2016/10/11 13:02:51 /github handler requst: status “good”, shared result true
2016/10/11 13:02:51 /github handler requst: status “good”, shared result true
2016/10/11 13:02:51 /github handler requst: status “good”, shared result true
2016/10/11 13:02:51 /github handler requst: status “good”, shared result true
2016/10/11 13:02:51 /github handler requst: status “good”, shared result true
2016/10/11 13:02:51 /github handler requst: status “good”, shared result true
2016/10/11 13:02:51 /github handler requst: status “good”, shared result true

Much better, this time we only made one request to GitHub and all the requests shared the result.

This is a good time to point out the singleflight only helps with in progress requests, it’s not going to cache the result after the function call completes. We can demonstrate this by running vegeta for 2 seconds (remember that there’s an artificial delay of 1 second in the githubStatus() function).

# echo “GET http://localhost:8080/github" | vegeta attack -duration=2s -rate=10 | vegeta report
Requests [total, rate] 20, 10.53
Duration [total, attack, wait] 2.609915466s, 1.899999924s, 709.915542ms
Latencies [mean, 50, 95, 99, max] 807.15366ms, 809.915542ms, 1.373483043s, 1.373483043s, 1.473610875s
Bytes In [total, mean] 420, 21.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:20
Error Set:
2016/10/11 13:17:07 Making request to GitHub API
2016/10/11 13:17:08 Request to GitHub API Complete
2016/10/11 13:17:08 /github handler requst: status “good”, shared result true
2016/10/11 13:17:08 /github handler requst: status “good”, shared result true
2016/10/11 13:17:08 /github handler requst: status “good”, shared result true
2016/10/11 13:17:08 /github handler requst: status “good”, shared result true
2016/10/11 13:17:08 /github handler requst: status “good”, shared result true
2016/10/11 13:17:08 /github handler requst: status “good”, shared result true
2016/10/11 13:17:08 /github handler requst: status “good”, shared result true
2016/10/11 13:17:08 /github handler requst: status “good”, shared result true
2016/10/11 13:17:08 /github handler requst: status “good”, shared result true
2016/10/11 13:17:08 /github handler requst: status “good”, shared result true
2016/10/11 13:17:08 /github handler requst: status “good”, shared result true
2016/10/11 13:17:08 /github handler requst: status “good”, shared result true
2016/10/11 13:17:08 /github handler requst: status “good”, shared result true
2016/10/11 13:17:08 /github handler requst: status “good”, shared result true
2016/10/11 13:17:08 /github handler requst: status “good”, shared result true
2016/10/11 13:17:08 Making request to GitHub API
2016/10/11 13:17:10 Request to GitHub API Complete
2016/10/11 13:17:10 /github handler requst: status “good”, shared result true
2016/10/11 13:17:10 /github handler requst: status “good”, shared result true
2016/10/11 13:17:10 /github handler requst: status “good”, shared result true
2016/10/11 13:17:10 /github handler requst: status “good”, shared result true
2016/10/11 13:17:10 /github handler requst: status “good”, shared result true

We can see that after the first request to GitHub completes, another is dispatched. To reiterate, singleflight is for avoiding multiple inflight requests, not caching.

Keys

We skipped over the usage of the key. It’s pretty straightforward. The key is used to differentiate separate functions. Let’s add a BitBucket API status endpoint.

Other than small differences in the JSON structure, the GitHub and BitBucket handlers are nearly identical. As noted in the code, a different key is used to differentiate the inflight functions in the singleflight.Group.

echo “GET http://localhost:8080/github\nGET http://localhost:8080/bitbucket" | vegeta attack -duration=1s -rate=10 | vegeta report
Requests [total, rate] 10, 11.11
Duration [total, attack, wait] 1.554732379s, 899.999947ms, 654.732432ms
Latencies [mean, 50, 95, 99, max] 1.069608701s, 984.508873ms, 1.384495166s, 1.384495166s, 1.5546515s
Bytes In [total, mean] 320, 32.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:10
Error Set:
# go run e3_bitbucket.go
2016/10/11 13:27:48 Making request to GitHub API
2016/10/11 13:27:48 Making request to BitBucket API
2016/10/11 13:27:49 Request to BitBucket API Complete
2016/10/11 13:27:49 /bitbucket handler requst: status “All Systems Operational”, shared result true
2016/10/11 13:27:49 /bitbucket handler requst: status “All Systems Operational”, shared result true
2016/10/11 13:27:49 /bitbucket handler requst: status “All Systems Operational”, shared result true
2016/10/11 13:27:49 /bitbucket handler requst: status “All Systems Operational”, shared result true
2016/10/11 13:27:49 /bitbucket handler requst: status “All Systems Operational”, shared result true
2016/10/11 13:27:49 Request to GitHub API Complete
2016/10/11 13:27:49 /github handler requst: status “good”, shared result true
2016/10/11 13:27:49 /github handler requst: status “good”, shared result true
2016/10/11 13:27:49 /github handler requst: status “good”, shared result true
2016/10/11 13:27:49 /github handler requst: status “good”, shared result true
2016/10/11 13:27:49 /github handler requst: status “good”, shared result true

We can see that the githubStatus() and bitbucketStatus() functions were each called once and the the status strings are obviously different.

For fun let’s see what it would look like if we accidentally used the same key.

# go run e4_same_key.go
2016/10/11 13:34:39 Making request to GitHub API
2016/10/11 13:34:40 Request to GitHub API Complete
2016/10/11 13:34:40 /github handler requst: status “good”, shared result true
2016/10/11 13:34:40 /github handler requst: status “good”, shared result true
2016/10/11 13:34:40 /github handler requst: status “good”, shared result true
2016/10/11 13:34:40 /bitbucket handler requst: status “good”, shared result true
2016/10/11 13:34:40 /bitbucket handler requst: status “good”, shared result true
2016/10/11 13:34:40 /bitbucket handler requst: status “good”, shared result true
2016/10/11 13:34:40 /github handler requst: status “good”, shared result true
2016/10/11 13:34:40 /github handler requst: status “good”, shared result true
2016/10/11 13:34:40 /bitbucket handler requst: status “good”, shared result true
2016/10/11 13:34:40 /bitbucket handler requst: status “good”, shared result true

As you might have expected, the BitBucket handler ends up returning the result for GitHub. (Or vice versa, depending on which handler is executed first.)

Forgetting a Key

In some situations you may want to forget a key. For our example, let’s say that we only want to share the response for requests that come in within 250ms of the first request being initiated.

# echo “GET http://localhost:8080/github" | vegeta attack -duration=1s -rate=10 | vegeta report
Requests [total, rate] 10, 11.11
Duration [total, attack, wait] 2.007352469s, 899.999899ms, 1.10735257s
Latencies [mean, 50, 95, 99, max] 1.214591847s, 1.13383041s, 1.432634042s, 1.432634042s, 1.53263476s
Bytes In [total, mean] 210, 21.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:10
Error Set:
# go run e5_forget.go
2016/10/11 13:45:21 Making request to GitHub API
2016/10/11 13:45:21 Deleting “github” key
2016/10/11 13:45:21 Making request to GitHub API
2016/10/11 13:45:21 Deleting “github” key
2016/10/11 13:45:21 Making request to GitHub API
2016/10/11 13:45:22 Deleting “github” key
2016/10/11 13:45:22 Making request to GitHub API
2016/10/11 13:45:22 Deleting “github” key
2016/10/11 13:45:22 Request to GitHub API Complete
2016/10/11 13:45:22 /github handler requst: status “good”, shared result true
2016/10/11 13:45:22 /github handler requst: status “good”, shared result true
2016/10/11 13:45:22 /github handler requst: status “good”, shared result true
2016/10/11 13:45:22 Request to GitHub API Complete
2016/10/11 13:45:22 /github handler requst: status “good”, shared result true
2016/10/11 13:45:22 /github handler requst: status “good”, shared result true
2016/10/11 13:45:22 /github handler requst: status “good”, shared result true
2016/10/11 13:45:22 Request to GitHub API Complete
2016/10/11 13:45:22 /github handler requst: status “good”, shared result true
2016/10/11 13:45:22 /github handler requst: status “good”, shared result true
2016/10/11 13:45:22 /github handler requst: status “good”, shared result true
2016/10/11 13:45:23 Request to GitHub API Complete
2016/10/11 13:45:23 /github handler requst: status “good”, shared result false

We can see that four separate requests were made, one every 250ms. Though it may not be as clear as we’d like; such is the way of concurrency.

Note that deleting a key does not affect the current inflight requests, only requests that happen after the key is deleted.

DoChan

As an alternative to Do(), singleflight provides DoChan(). It’s essentially the same thing, but the result is delivered via a channel. Useful if you want to implement a timeout with a select statement.

As written, this will always result in a timeout. I encourage you to play around with the example if you’re unfamiliar with timeout’s using select statements.

# echo “GET http://localhost:8080/github" | vegeta attack -duration=1s -rate=10 | vegeta report
Requests [total, rate] 10, 11.11
Duration [total, attack, wait] 1.403848483s, 899.99994ms, 503.848543ms
Latencies [mean, 50, 95, 99, max] 506.630505ms, 506.02365ms, 509.411055ms, 509.411055ms, 511.554041ms
Bytes In [total, mean] 250, 25.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 0.00%
Status Codes [code:count] 504:10
Error Set:
504 Gateway Timeout

vegeta reports a 504 Gateway Timeout

# go run e6_dochan.go
2016/10/11 14:14:32 Making request to GitHub API
2016/10/11 14:14:32 /github handler timed out
2016/10/11 14:14:33 /github handler timed out
2016/10/11 14:14:33 /github handler timed out
2016/10/11 14:14:33 /github handler timed out
2016/10/11 14:14:33 /github handler timed out
2016/10/11 14:14:33 /github handler timed out
2016/10/11 14:14:33 /github handler timed out
2016/10/11 14:14:33 /github handler timed out
2016/10/11 14:14:33 /github handler timed out
2016/10/11 14:14:33 /github handler timed out
2016/10/11 14:14:34 Request to GitHub API Complete

All the requests sharing the result are timed out.

I hope this shed a little light on a new(ish) package. singleflight probably isn’t something you’re going to reach for everyday, but it is another tool for your programming toolbox.

Additional Reading

Kale Blankenship

Written by

I pretend to know stuff about things.