Go Philosophy: Share Memory by Communicating
This is an important point, for the Go language and beyond. Whenever writing any code that may potentially run in a concurrent or threaded context, it is imperative to consider what the implications of running multiple copies of your code at once are. How are you accessing data? Can it cause race conditions? Do you need to lock a resource before you can read or write it? Is it okay for me to steal the Go blog’s title?
These are often very abstract concepts, though, and easy to lose sight of in the weeds of day-to-day coding. To shed some more practical light on this, let’s look at a real bit of code that has probably been implemented thousands of times in various projects: an ID generator.
The requirement is simple: we need a bit of code that provides a simple, concise way to generate a new unique identifier (a string of random characters) of a particular pre-configured length.
For example, in Python, it might look like this:
Simple! Store the length, store a mapping of IDs that already exist, and call get_unique_id() whenever you need a new ID.
But I mentioned Go. To start with, why not do a simple translation? Not every concept translates seamlessly due to significant design differences between the two languages, but it’s close enough:
Still pretty simple, right? Well, it’s less readable than Python, but what isn’t?
There’s a problem, though.
Suppose we were to stress test it. We’ll take 3 “goroutines” (Go concurrency abstraction) and have them all generate 1000 IDs each at once.
fatal error: concurrent map read and map write
goroutine 5 [running]:
And worse, if we do the same thing Python — start 3 threads generating 1000 IDs each — it works just fine!
Tragedy! Fun fact: Go maps are not thread-safe when writing to them and never will be. Our current ID generator implementation simply does not work in a multi-threaded context.
So, how do we fix it? I mentioned it in the article title: communicating. Go features a
channel native type for cross-goroutine communication. It acts as a “socket” of sorts, and can be blocking or non-blocking depending on whether it was initialized with a buffer, or whether it’s called using a
select statement. It is the core building block of the Go philosophy “Do not communicate by sharing memory; instead, share memory by communicating.”
Since our map is not thread-safe, the best way to stop it from breaking is simply to put all code accessing it in a single goroutine, and, instead of exposing a
GetUniqueID() function, expose a
To clarify what’s going on… Upon creation of the generator in
NewIDGenerator(), a separate goroutine is also started, running the
generateInfiniteIDs(). This goroutine runs forever, finding unique IDs and tossing them into the unique ID channel. Due to the nature of how Go channels work, this operation blocks until someone reads the ID from the channel — at which point the function can continue, generate another ID, block until that’s used… and so on.
With this update, whenever we want to get a new unique ID, instead of
gen.GetUniqueID() , we can call
<- gen.UniqueIDs . Making that change to our little stress test from above yields:
Generated 1000 IDs
Generated 1000 IDs
Generated 1000 IDs
There’s one more problem, though. If someone else were to use or maintain this code, they might find it confusing: using channels in this way is unconventional and exposes weird behavior if someone were to write to the channel externally. So, why not combine this with our original approach and mask the channel behind a friendly
Now that that’s working, let’s circle back: why was this a problem in Go, but not in our Python example?
The answer is Python’s global interpreter lock. Necessary for garbage collection, the GIL makes sure that no two Python instructions can run at the same time — even if they are in different threads. This results in terrible threading performance in many use cases, but as a side effect, means that concurrent reads/writes are generally safe.
Go has no such restriction; its runtime uses a different sort of garbage collection. This means that its goroutines (which run in different threads if possible) can perform much better than Python threads, but do not have any built-in collision protection. A similar lack of collision protection is present in C/C++, Java, and a number of other languages.
That’s why we had to focus on sharing memory by communicating, not vice versa. Such an approach would also (possibly) help Python, where focusing on communication is important when using multiprocessing to achieve parallel code.
That’s okay. This article only explains a limited technique to make concurrent programming more manageable. And hey, at least you have a graduation cap now. 🎓