Does the Go race detector catch all data race bugs?
TL;DR: it detects the data race conditions when they occur.
The Data Race Detector is a grand, invaluable feature of the go tooling. Today I want to clarify its investigation power.
What is a data race?
A data race occurs when two goroutines access the same variable concurrently and at least one of the accesses is a write.
Data races exist in go
It is possible to write incorrectly synchronized code and compile it. The compiler regards it as valid and doesn’t emit any warnings or errors for data race bugs.
However, these bugs are real and lead to undefined behavior, which can be a crash, or a silent incorrect behavior, or the silent correct behavior “by chance”. Here are a few examples.
It is fundamental to understand that a race condition is a property of a particular execution, and a synchronization bug is a property of the program.
A race condition is always caused by a bug, but a bug may remain silent (not triggering any race condition) for many executions.
The race detector doesn’t catch bugs at compile time
$ go build -race mycmd
This command does not mean “Compile
mycmd, and report the synchronization bugs you’ve detected during compilation”.
What it means is “Compile a special, instrumented version of
mycmd is executed, if a data race occurs during the execution, report a warning on stdout, and at the end of the program exit with code 66”.
The instrumented version bears a runtime overhead, i.e. is slower and uses more memory.
The race detector reported a warning but I don’t think my code actually has a bug. Can I ignore it?
The race detector doesn’t have false positives.
When it emits a warning, it always means that a race condition occurred.
When a race condition occurs, it always means that the program has a bug.
If you strongly believe that you witnessed a false positive, then report a bug for the race detector. If you have good reasons to believe that the race condition was caused by the standard library or by the runtime (rather than your own code), then report a bug for the standard library or the runtime.
The race detector reported a warning in a non-critical part of my code. Can I ignore it?
I ran my program, and the race detector didn’t detect anything. Does this mean that my code has no data race bugs?
This means that, during this execution of your program, the specific interleaving of reads and writes that actually occurred did not violate the Memory Model. This does not imply that another execution of the same program won’t encounter a data race condition.
Many factors may lead to the exact execution trace to be different from one execution to another.
In the following program, we have 3 counters.
2 goroutines each increment a counter, and the main goroutine waits for them to complete. But the 2 goroutines may choose to increment the same counter, anarchically:
Now consider these 2 executions:
$ go run -race racy3.go
[1 0 1]
$ go run -race racy3.go
WARNING: DATA RACE
Read at 0x00c4200b8000 by goroutine 7:
/home/deleplace/racy3.go:30 +0x66Previous write at 0x00c4200b8000 by goroutine 6:
/home/deleplace/racy3.go:24 +0x87Goroutine 7 (running) created at:
/home/deleplace/racy3.go:28 +0x1d0Goroutine 6 (finished) created at:
[0 0 2]
Found 1 data race(s)
In the first run, the random numbers got the values 0 and 2, thus the goroutines incremented a different counter.
In the second run, the random numbers got the values 2 and 2, thus the goroutines incremented the same counter without proper synchronization (no happens-before relation), which is forbidden and is a serious problem.
Detecting the synchronization bugs for all possible executions of a given go program would solve the halting problem, which is out of the scope of the race detector.
If a data race condition occurs during an execution of my program, will the race detector catch it?
Yes, with very high probability.
By instrumenting the generated binary to check every memory write and every memory read, the race detector catches the improperly synchronized memory accesses, when they actually occur.
In rare cases, the information necessary for the detection is overwritten over time, leading to a false negative (a race occurred but was not detected). It is recommended to test execution scenarios many times, first to ensure that the hypothetical race happens (that part may be deterministic or not, depending on your code), and second to ensure it didn’t go unnoticed as a rare false negative.
- The race detector is easy to use. I strongly recommend it.
- It can’t tell you all of the bugs lurking in your code.
- But it shouts a WARNING for every data race that occurs.
- Almost all of the races: very few false negatives!
- Only true races: no false positives!
Edit: I initially stated that no false negative could ever happen, but this is not 100% true.
Follow-up: I wrote Race-free doesn’t mean deterministic to clarify that processes may “race” for a resource, with proper synchronization.