Heartbeats in Golang
What are Heartbeats?
Heartbeats are a way for concurrent processes to signal life to outside parties. They get their name from human anatomy wherein a heartbeat signifies life to an observer. Heartbeats have been around since before Go, and remain useful within it.
There are a few different reasons heartbeats are interesting for concurrent code. They allow us insights into our system, and they can make testing the system deterministic when it might otherwise not be.
There are two different types of heartbeats:
- Heartbeats that occur at a time interval.
- Heartbeats occur at the beginning of a unit of work.
Heartbeats that occur on a time interval are useful for concurrent code that might be waiting for something else to happen for it to process a unit of work. Because you don’t know when that work might come in, your goroutine might be sitting around for a while waiting for something to happen. A heartbeat is a way to signal to its listeners that everything is well and that silence is expected.
The following code demonstrates a goroutine that exposes a heartbeat:
Notice that because we might be sending out multiple pulses while we wait for input, or multiple pulses while waiting to send results, all the
select statements need to be within
for loops. Looking good so far; how do we utilize this function and consume the events it emits? Let’s take a look:
running this code produces:
You can see that we receive about two pulses per result as we intended.
Now in a properly functioning system, heartbeats aren’t that interesting. We might use them to gather statistics regarding idle time, but the utility for interval-based heartbeats really shines when your goroutine isn’t behaving as expected.
Consider the next example. We’ll simulate an incorrectly written goroutine with panic by stopping the goroutine after only two iterations, and then not closing either of our channels. Let’s have a look:
and the result is:
Beautiful! Within two seconds our system realizes something is amiss with our goroutine and breaks the for-select loop. By using a heartbeat, we have successfully avoided a deadlock, and we remain deterministic by not having to rely on a longer timeout.
Also, note that heartbeats help with the opposite case: they let us know that long-running goroutines remain up, but are just taking a while to produce a value to send on the values channel.
Now let’s shift over to looking at heartbeats that occur at the beginning of a unit of work. These are extremely useful for tests. Here’s an example that sends a pulse before every unit of work:
and you can see the result is:
You can see in this example that we receive one pulse for every result, as intended.
Where this technique really shines is in writing tests. Interval-based heartbeats can be used in the same fashion, but if you only care that the goroutine has started doing its work, this style of heartbeat is simple. Consider the following code:
DoWork function is a pretty simple generator that converts the numbers we pass in to a stream on the channel it returns. Let’s try testing this function. Here’s an example of a bad test:
and running this test produces:
This test is bad because it’s non-deterministic. In our example function, I’ve ensured this test will always fail, but if I were to remove the
time.Sleep, the situation actually gets worse: this test will pass at times, and fail at others.
This is an awful, awful position to be in. The team no longer knows whether it can trust a test failure and begin ignoring failures the whole endeavor begins to unravel.
Fortunately, with a heartbeat, this is easily solved. Here is a test that is deterministic:
and the result is:
Because of the heartbeat, we can safely write our test without timeouts. The only risk we run is of one of our iterations taking an inordinate amount of time. If that’s important to us, we can utilize the safer interval-based heartbeats and achieve perfect safety.
Here is an example of a test utilizing interval-based heartbeats:
running this test produces:
You’ve probably noticed that this version of the test is much less clear. The logic of what we’re testing is a bit muddled. For this reason, if you’re reasonably sure the goroutine’s loop won’t stop executing once it’s started I recommend only blocking on the first heartbeat and then falling into a simple
range statement. You can write separate tests that specifically test for failing to close channels, loop iterations taking too long, and any other timing-related issues.
Resource: Concurrency in Go: Tools and Techniques for Developers by Katherine Cox-Buday