The channel mechanism in Go is quite powerful, but understanding the inner concepts could even make it more powerful. Indeed, choosing a buffered or unbuffered channel will change the behavior of the application as well as the performances.
An unbuffered channel is a channel that needs a receiver as soon as a message is emitted to the channel. To declare an unbuffered channel, you just don’t declare a capacity. Here is an example:
The first goroutine is blocked after sending the message
foo since no receivers are yet ready. This behavior is well explained in the specification:
If the capacity is zero or absent, the channel is unbuffered and communication succeeds only when both a sender and receiver are ready.
The documentation of effective Go is also very clear about that:
If the channel is unbuffered, the sender blocks until the receiver has received the value
The internal representation of a channel could give more interesting details on this behavior.
The channel struct
hchan is available in
chan.go from the
runtime package. The structure contains the attributes related to the buffer of the channel, but in order to illustrate the unbuffered channel, I will omit those attributes that we will see later. Here is the representation of the unbuffered channel:
The channel keeps pointers to a list of receivers
recvq and senders
sendq, represented by linked list
sudog and contains pointers to next and previous elements along the information related to the goroutine that handles the receiver/sender. With this information, it becomes easy for Go to know when a channel should block a receiver if a sender is missing and vice versa.
Here is the workflow of our previous example:
- The channel is created with an empty list of receivers and senders.
- Our first goroutine sends the value
footo the channel, line 16.
- The channel acquires a struct
sudogfrom a pool that will represent the sender. This structure will keep reference to the goroutine and the value
- This sender is now enqueued in the
- The goroutine moves into a waiting state with the reason “chan send”.
- Our second goroutine will read a message from the channel, line 23.
- The channel will dequeue the
sendqlist to get the waiting sender that is represented by the struct seen in the step 3.
- The channel will use
memmovefunction to copy the value sent by the sender, wrapped into the
sudogstruct, to our variable that reads the channel.
- Our first goroutine parked in the step 5 can now resume and will release the
sudogacquired in step 3.
As we see again in the workflow, the goroutine has to switch to waiting until a receiver is available. However, if needed, this blocking behavior could be avoided thanks to the buffered channels.
I will slightly modify the previous example in order to add a buffer:
Let’s now analyze the struct
hchan with the fields related to the buffer according to this example:
The buffer is made of five attributes:
qcountstores the current number of elements in the buffer
dataqsizstores the number of maximum elements in the buffer
bufpoints to a memory segment that contains space for the maximum number of elements in the buffer
sendxstores the position in the buffer for the next element to be received by the channel
recvxstores the position in the buffer for the next element to be returned by the channel
recvx the buffer works like a circular queue:
The circular queue allows us to maintain an order in the buffer without needing to keep shifting the elements when one of them is popped out from the buffer.
Once the limit of the buffer is reached, the goroutine that tries to push an element in the buffer will be moved in the sender list and switched to the waiting status as we have seen in the previous section. Then, as soon as the program will read the buffer, the element at the position
recvx from the buffer will be returned and the waiting goroutine will resume and its value will be pushed into the buffer. Those priorities allow the channel to keep a First In First Out behavior.
Latencies due to under-sized buffer
The size of the buffer we define during the channel creation could drastically impact the performances. I will use the fan-out pattern that intensively uses the channel in order to see the impact of different buffer sizes. Here are some benchmarks:
In our benchmark, one producer will inject a one million integer element in the channel while ten producers will read and add them to a single result variable named
I will run them ten times and analyze the result thanks to
WithNoBuffer-8 306µs ± 3%
WithBufferSizeOf1-8 248µs ± 1%
WithBufferSizeEqualsToNumberOfWorker-8 183µs ± 4%
WithBufferSizeExceedsNumberOfWorker-8 134µs ± 2%
A well sized buffer could really make your application faster! Let’s analyze the traces of our benchmarks to confirm where the latencies are.
Tracing the latency
Tracing your benchmarks will give you access to a synchronization blocking profile that shows where goroutines block waiting on synchronization primitives are. Goroutines spend 9ms blocked in synchronization waiting for a value from the unbuffered channel while a 50-sized buffer only wait for 1.9ms:
Thanks to the buffer, when the latency comes to send a value it is decreased by five:
We do now have a confirmation of our previous doubts. The size of the buffer can play an important role in our application performances.