Yet Another Go Concurrency Post
Channels or sync primitives (mutexes et. al.)? What to do?
I don’t have much Go experience. So, recently, when a company I’m considering joining gave me interview homework, I took the opportunity to practice my Go. It’s one of the primary languages in their toolkit (or so it seems from the outside), and interview homework is typically a boring exercise, so why not make it interesting?
The problem was pretty straightforward — read from a file being tailed, parse the bytes, and print key statistics about the bytes in that file on a regular interval.
In Go, this is obviously accomplished using multiple goroutines. The “print at regular intervals” routine obviously must be scheduled separately from the “read from a file” routine. At the very least, no more bytes may be written to that file for a long time, and the “print at regular intervals” routine should still fire while the “read from a file” routine is blocked.
So how to coordinate? Because it would be weird if, during a particular “print” cycle, the key statistics were to change because bytes were read. Imagine printing three statistics: total number of entries in the file, number of unparseable entries, and number of valid entries. If the total were printed first, then the latter two, but the statistics changed in the middle, there would be a situation where “unparseable + valid != total”. That’s inconsistent. And indeed, that’s the term we would use: inconsistent reads. We want consistency, even in this trivial and contrived situation. How do we achieve it?
Well, the problem comes down to one of shared state — two routines are accessing the key statistics and may do so at the same time. This is not just a theoretical scenario on a single processor system where preemptive scheduling pauses a thread not explicitly yielding. Most systems are now multiprocessor, and Go programs take full advantage. The two routines described above, implemented as goroutines, would be mapped to two OS threads, each reading and writing to the same shared memory.
Go offers two solutions:
- Share, but with mutual exclusion (mutex) on the shared resource
- Don’t share, using the communicating sequential processes (CSP) model.
Mutexes require the user of a shared resource to acquire a lock (atomically) before accessing that resource, and unlock it when the user is done. If every user plays nice, it works well. The problem is when some poor software engineer forgets the rules of engagement. After all, there’s nothing stopping the engineer from writing code that simply doesn’t lock and unlock. This is a problem because locks aren’t composable. For example, the lock and unlock cannot simply be placed inside the “read total number” method of the key statistics. Yes, this would ensure the user engineer can’t forget. But if reading “total read number” and “unparseable number” as part of the same atomic operation is desired, a special method for that case would have to be written. (Yes, in this degenerate case, a reentrant lock would do, but lock composition in general is impossible. And besides, Go has no reentrant locks.)
So how about CSP? Go offers channels — coordinating goroutines via message passing. How would that work here? Rather than sharing state, a single goroutine in a loop would receive messages from other goroutines. One may be a request to atomically read the three statistics referenced above. Another may be a request to atomically add new information to the key statistics. There’s no sharing — only one routine touches the memory at a time; indeed, only one routine touches the memory ever. In this way, it’s completely safe — the user engineer of such an interface is forced to communicate by channel, so they cannot forget to be concurrency-safe. Great!
Or is it? The reason I wrote this post is because I had a lot of trouble deciding which method to use. There are significant drawbacks to the CSP / channel option as well:
- Performance. Normally, I would say that this is a red herring, but if someone is using Go, performance is probably a factor. Mutexes are an order of magnitude faster.
- Composability. Because each message must be treated as the signal to perform an atomic operation (if you want to stay sane…), the same problem arises as with mutexes. In complex situations, the only solution is to write a query DSL. Right. Which leads us to:
- Tight coupling. Rather than trusting user engineers to lock and unlock and do whatever they want in the middle, in the channel model, any set of operations that must be composed into an atom must be implemented in the resource’s designated goroutine code. That’s bad enough where all of the code is in the same project. But how does one possibly use libraries in this way?
Please keep in mind that I have minimal Go experience, and CSP is new to me. So perhaps my assessment is incorrect. Please contact me and tell me so. (@alyssackwan on Twitter.)
I ultimately decided to use mutexes, mostly because I understand them better. I eagerly await feedback from the company. In the meantime, I’d really like to better understand CSP and how and when to use channels.