Why Concurrency is Hard
Editor’s Note: Concurrency is one of the hardest concepts for many developers to grasp, but it is an important concept to grasp in modern software development. In this excerpt from the first chapter of her book Concurrency in Go, Katherine Cox-Buday discusses one of the most common issues with concurrent programming: race conditions.
Concurrent code is notoriously difficult to get right. It usually takes a few iterations to get it working as expected, and even then it’s not uncommon for bugs to exist in code for years before some change in timing (heavier disk utilization, more users logged into the system, etc.) causes a previously undiscovered bug to rear its head. Indeed, for this very book, I’ve gotten as many eyes as possible on the code to try and mitigate this.
Fortunately everyone runs into the same issues when working with concurrent code. Because of this, computer scientists have been able to label the common issues, which allows us to discuss how they arise, why, and how to solve them.
Race Conditions
A race condition occurs when two or more operations must execute in the correct order, but the program has not been written so that this order is guaranteed to be maintained.
Most of the time, this shows up in what’s called a data race, where one concurrent operation attempts to read a variable while at some undetermined time another concurrent operation is attempting to write to the same variable.
Here’s a basic example:
1 var data int2 go func() { // In Go, you can use the go keyword to run a
// function concurrently. Doing so creates
// what’s called a goroutine.
3 data++4 }()5 if data == 0 {6 fmt.Printf("the value is %v.\n", data)7 }
Here, lines 3 and 5 are both trying to access the variable data, but there is no guarantee what order this might happen in. There are three possible outcomes to running this code:
- Nothing is printed. In this case, line 3 was executed before line 5.
- “the value is 0” is printed. In this case, lines 5 and 6 were executed before line 3.
- “the value is 1” is printed. In this case, line 5 was executed before line 3, but line 3 was executed before line 6.
As you can see, just a few lines of incorrect code can introduce tremendous variability into your program.
Most of the time, data races are introduced because the developers are thinking about the problem sequentially. They assume that because a line of code falls before another that it will run first. They assume the goroutine above will be scheduled and execute before the data
variable is read in the if
statement.
When writing concurrent code, you have to meticulously iterate through the possible scenarios. Unless you’re utilizing some of the techniques we’ll cover later in the book, you have no guarantees that your code will run in the order it’s listed in the sourcecode. I sometimes find it helpful to imagine a large period of time passing between operations. Imagine an hour passes between the time when the goroutine is invoked, and when it is run. How would the rest of the program behave? What if it took an hour between the goroutine executing successfully and the program reaching the if
statement? Thinking in this manner helps me because to a computer, the scale may be different, but the relative time differentials are more or less the same.
Indeed, some developers fall into the trap of sprinkling sleeps throughout their code exactly because it seems to solve their concurrency problems. Let’s try that in the preceding program:
1 var data int2 go func() { data++ }()3 time.Sleep(1*time.Second) // This is bad!4 if data == 0 {5 fmt.Printf("the value is %v.\n" data)6 }
Have we solved our data race? No. In fact, it’s still possible for all three outcomes to arise from this program, just increasingly unlikely. The longer we sleep in between invoking our goroutine and checking the value of data, the closer our program gets to achieving correctness — but this probability asymptotically approaches logical correctness; it will never be logically correct.
In addition to this, we’ve now introduced an inefficiency into our algorithm. We now have to sleep for one second to make it more likely we won’t see our data race. If we utilized the correct tools, we might not have to wait at all, or the wait could be only a microsecond.
The takeaway here is that you should always target logical correctness. Introducing sleeps into your code can be a handy way to debug concurrent programs, but they are not a solution.
Race conditions are one of the most insidious types of concurrency bugs because they may not show up until years after the code has been placed into production. They are usually precipitated by a change in the environment the code is executing in, or an unprecedented occurrence. In these cases, the code seems to be behaving correctly, but in reality, there’s just a very high chance that the operations will be executed in order. Sooner or later, the program will have an unintended consequence.
Learn faster. Dig deeper. See farther.
Join the O’Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.
Katherine is a computer scientist currently working at DigitalOcean. Her hobbies include software engineering, creative writing, Go (igo, baduk, weiquei), and music, all of which she pursues intermittently and with various levels of dedication.