Channel Direct Send

Phil Pearl
3 min readAug 3, 2017

I just read some fantastic slides on channels from what sounds like an excellent GopherCon. (and the video is now here). I was very intrigued by the concept that unbuffered channels implement a “Direct Send”, where the calling goroutine in some sense calls directly into the reading goroutine. Now, back in the ’90s the company I worked for wrote software in a non-blocking message queuing environment based on CSP, and I remember when we implemented a similar-sounding “Direct Send” it made a significant positive impact.

So I thought I’d try it out.

I wrote myself a benchmark that sends integers to a background goroutine that adds them up. I made 3 versions, one with zero buffering for the channel, one with a buffer of 1 and one with a buffer of 10.

So, if I’ve understood, then zero buffering should be faster than a buffer of 1 because the direct send path will cut down some of the switching overhead to the receiving goroutine.

I’ve restricted the test to 1 CPU.

go test -run ^$ -bench BenchmarkChannelDirectSend -cpu 1BenchmarkChannelDirectSend/direct          10000000      225 ns/op
BenchmarkChannelDirectSend/buffered 10000000 173 ns/op
BenchmarkChannelDirectSend/buffered-10 20000000 97.4 ns/op

Ah, looks like the buffered case is faster, and that more buffering improves things even more.

Now, I think I understand why a buffer of 10 is better than a buffer of 1. With a buffer of 10, I imagine the sending goroutine will run for 10 loops, then the receiver will run for 10 loops, so we only switch goroutines every 10 loops.

But I don’t understand why a buffer of 1 beats a zero buffer. Hmmm... Ah, perhaps my test isn’t quite right. What if we use the channel to send to a goroutine, then immediately ask for a response back? In this case I really can’t gain from the “buffering” aspect of a buffered channel. Perhaps the unbuffered channel will shine in this case?

And here’s the result running on one CPU.

go test -run ^$ -bench BenchmarkChannelDirectSend2 -cpu 1BenchmarkChannelDirectSend2/direct           5000000       379 ns/op
BenchmarkChannelDirectSend2/buffered 5000000 388 ns/op
BenchmarkChannelDirectSend2/buffered-10 5000000 383 ns/op

The direct send version is minutely faster in this case. About 9 ns/op. This is quite a small improvement. But it seems like a real effect.

If I allow the test to run on 2 CPUs the advantage disappears, but everything is slower. I’d guess the code is switching between OS threads, and this is slowing things down. I’d hope this is where we’d see a big improvement from using unbuffered channels, but it doesn’t seem to be there.

go test -run ^$ -bench BenchmarkChannelDirectSend2 -cpu 2BenchmarkChannelDirectSend2/direct-2           3000000     445 ns/op
BenchmarkChannelDirectSend2/buffered-2 3000000 437 ns/op
BenchmarkChannelDirectSend2/buffered-10-2 3000000 443 ns/op

What else could we try? Well, the “direct send” case can happen with buffered channels too. If there’s a receiver waiting when you send on a channel then the direct send case kicks in. So perhaps with these previous tests that was happening with the buffered cases. So we could try to eliminate this altogether. For our buffered case we could do all the sends before the receives.

And the results:

BenchmarkChannelDirectSend3-8           20000000         64.7 ns/op

So, that’s quite a bit quicker than any of our unbuffered/direct send cases. So what can we conclude? Well, the unbuffered cases and the cases where there are many switches between goroutines are slower than when we buffer everything and reduce context-switches as much as we can. If direct send is a useful optimisation it isn’t so good you should try to reduce buffering to try to use it. In fact, the strongest lesson is that larger buffering will reduce context switches and improve performance.

But just maybe in those cases where context switches are inevitable, perhaps direct send makes things better than they would otherwise be.

That’s about as far as I can get without hacking the direct send out of the go runtime. Perhaps next time? Remember, every time you press the heart a gopher gets its wings.

--

--