Go’s Extended Concurrency: Semaphores (Part 1)

Ralph Caraveo
11 min readSep 4, 2018

--

Gophers waiting in line to go to the discotheque

Goroutines, Channels and Mutexes — if you’ve spent any significant time with Go’s tried and true synchronization primitives at all you should be well adapted to seeing these concepts expressed in everyday code. From very large open-source projects, to quick and dirty scripts; Go’s model of concurrency is not only powerful but a first class concept to the language.

Today I’d like to give an honorable mention to another powerful Go synchronization primitive out in the wild that I find myself reaching for more and more. The semaphore: a construct which can be used to constrain or control access to a shared resource access via multiple threads or goroutines.

Semaphores in the real world

Are you hungry? Do you want to come with me to In-N-Out to go and get a burger? Imagine if you joined me along with 2000 other readers RIGHT NOW and we all went to get a burger…RIGHT NOW.

Now imagine if In-N-Out didn’t have a the concept of lines and we all just tried to order at the same time! What would happen? Chaos would happen! We would all enter In-N-Out at the same time, pushing each other, stomping on each other. We’d all be trying to order at the same time and probably shouting over each other. Meanwhile the employees would be confused, stressed out and would probably be crying as they scramble to fulfill all of our orders at the EXACT SAME TIME.

Assuming the employees grit their teeth and attempted to work through this calamity what would happen? Well, it’s very likely that our orders would take hours to fulfill. It’s also very likely that some orders would go missing. Some orders would be slightly wrong…and yet some orders would be vastly wrong. What the heck? I didn’t order this half-cooked 4x4 animal style with grilled onions. I don’t even like grilled onions and why is this burger only half-cooked?

Okay you get the point. Luckily, In-N-Out operates in a civilized fashion these days. The restaurant has around 3 or 4 lines where customers will stand and order by waiting their turn. Assuming In-N-Out has 4 lines, with 4 registers and 4 employees to receive orders this really means In-N-Out has a system in place that acts as a semaphore. The semaphore is the concept that allows In-N-Out to receive 4 orders concurrently (actually in parallel), causing everyone else to sit and wait. Of course there is likely a 5th order occurring at the drive-thru but that is besides the point.

This system works as long as everybody plays by the rules. This system allows In-N-Out to conduct business in a sane way. To take orders fast enough to keep the business moving but not so fast that chaos occurs.

An aside: Aren’t we really just talking about queues — another well known data-structure that allows FIFO access? It’s true, that the above scenario does have a relationship to utilizing queues and yes, each line can be thought of as a small queue. However, in the example above I’m less concerned about the actual lines/queues and more concerned with what’s happening at the head of all 4 lines. You see, In-N-Out can only truly ever allow access to 4 registers at a time. People standing in lines may go to the bathroom, they may jump to a shorter line, or let their friends cut. The primary concern here again is that we only ever have 4 cash registers and so how do we enforce a system where we don’t ever allow more than 4 people to order at a time. In this real-life example, perhaps the manager comes out and sends people away if they’re trying to order out of turn. “Hey, we can only take 4 orders at a time, so get in line buddy…” This means the manager can be thought of a enforcing semaphore logic.

This is why you need to know and understand semaphores.

Building a more technical mental model

Let us think about what the point of a semaphore is and why we’d ever want to use it. First don’t let the name throw you off too much. Suppose you have a shared resource that can be freely accessed by one or more threads. This shared resource could be a remote server, a shared in-memory datastore, a complex video transcoding pipeline. It doesn’t really matter what is being shared. What matters it that a system that is shared should be shared in a reliable way that doesn’t allow for bombardment of the system.

Specifically, if you were trying to control access to a pipeline it would be a shame if anyone could access your pipeline and kicked off 10,000 jobs to transcode some video. Such a situation can bring your transcoding pipeline to an immediate halt. This would be very bad for business.

Think of a mutex, which controls multi-threaded access to code. The mutex guards a critical section only ever allowing a single thread exclusivity over this code. This is important to avoid data-races and helps to avoid all sorts of problems.

Now let’s compare a mutex to a sempahore. If a mutex is concerned with ensuring a single thread ever accesses code exclusively a semaphore is concerned with ensuring at most N threads can ever access code exclusively. This really just means that a semaphore is a more generalized version of a mutex (mostly).

With that out of the way, you might be wondering what’s the point of giving exclusive access to N threads? The point is that in this scenario you are purposefully constraining access to a resource therefore protecting that resource from overuse.

One important point to understand is that this control and protection is only good if each thread is playing the game properly. Each thread must be aware of the semaphore and not do anything to circumvent using the semaphore otherwise you’ll end up back in square one. This is why it’s sometimes a good idea to abstract a way such details behind an API. The API acts as the only conduit into a service and within such a service control is limited via a semaphore.

Let’s recap before we continue:

  1. mutex: constrains access to a 1 single thread, to guard a critical section of code.
  2. semaphore: constrains access to at most N threads, to control/limit concurrent access to a shared resource.

When using a semaphore how do you figure out the value of N for how many threads to limit? Unfortunately there is no hard and fast rule and the final number of N is going to depend on many factors. A good place to start is by benchmarking and actually hitting your shared resource to see where it starts to fall over in terms of performance and latency. I can certainly tell you right now that creating a semaphore allowing access to up to 1 million goroutines is probably a bad thing. Forgive the silliness but the point is that you don’t want to reach for a semaphore if all you’re going to do is allow wide open access.

Now that we have some definitions out of the way let’s see how we could use a semaphore with some very basic examples. In the following two sections notice how minimal the code is to use such a powerful construct.

Bounding concurrency with the use of semaphores

Go semaphores!

Problem: Code that utilizes many goroutines can run the risk of bombarding some remote resource with hundreds or even thousands of goroutines. A bombarded system can quickly become overloaded to the point where you experience extreme latency spikes or worse service outages.

// remoteDeleteEmployeeRPC will delete an employee over the network.
func remoteDeleteEmployeeRPC() {
...
}
func main() {
for _, employee := range employeeList {
go remoteDeleteEmployeeRPC(employee.ID)
}
}

First, the code above is not quite realistic or robust. I’m purposefully not bothering with error handling and I’m not handling any type of return value and one could argue that such an operation could be done in a batched fashion.

The point I want to illustrate however is that the code above can easily get out of hand depending on the size of the employeeList variable. If the employeeList variable holds 5, 10 or even 50 employee records within it…you may find that the remoteDeleteEmployeeRPC invocation doesn’t even break a sweat.

You may additionally find that this code causes a catastrophic outage simply because the concurrency is potentially unbounded. You might say however, “but it is bounded by the size of the employeeList slice” which technically is accurate. However, unless you can always guarantee that employeeList will only ever be a reasonable size of which downstream services will handle gracefully, you can practically assume that this code is dangerous in the long term. As your system grows and complexity increases, it’s not unheard of to suddenly find yourself faced with a runaway storm of concurrent calls to your remote endpoints. Such an issue can occur in a single instance scenario where a single client is bombarding a remote resource. Now consider an army of clients executing unbounded calls…this is a recipe for disaster.

Option 1: Bounded channel

// remoteDeleteEmployeeRPC will delete an employee over the network.
func remoteDeleteEmployeeRPC() {
...
}
// sem is a channel that will allow up to 10 concurrent operations.
var sem = make(chan int, 10)
func main() {
for _, employee := range employeeList {
sem <- 1
go func(){
remoteDeleteEmployeeRPC(employee.ID)
<-sem
}()
}
}

The technique above requires nothing other than a buffered channel to accomplish the goal. In this particular case, we’re exploiting the properties of a buffered channel to simply limit concurrency. For each iteration of the loop over employeeList an attempt is made to place a single integer in the sem channel. If there is room, progress is made and the code below will immediately execute kicking off the goroutine. Once the remoteDeleteEmployeeRPC call returns, the sem is drained of the (single) integer therefore allowing the progress of yet another iteration to occur.

So what happens when the sem is full? Let’s think about what is happening for a minute. If the sem is full, this means that we must have at least 10 active goroutines trying to make progress over the network. If an 11th iteration occurs when an attempt is made to send a single integer value into the bounded sem channel, this will cause the code to block. The blocking is a fundamental property of of the bounded channel and this is a good thing! That 11th iteration is effectively held there until a single slot opens up in the channel. A single slot will eventually open up when one of the original 10 RPC calls returns to drain a single integer off the channel. A remote RPC call should return either upon success, an error or a timeout situation. At this point, the 11th iteration that was previously blocked can make forward progress.

Please note that in this example above, the buffered channel is utilized in such a way where we are exploiting its behavior but not necessarily using it to send meaningful data through it. Additionally, we are able to utilize the sem from multiple goroutines because after all, it is a channel and channels are inherently threat-safe. This is okay, and it works very well but let’s move to the big-brother version of the semaphore.

Option 2: Weighted Semaphore
Implementation: https://github.com/golang/sync/tree/master/semaphore

This implementation of a weighted semaphore is quite a bit more sophisticated then a bounded channel. In fact, the bounded channel approach works well and wonderfully illustrates how a semaphore truly works but it was really just a segue into understanding this package.

How is the weighted semaphore different? It has a few tricks up its sleeve but it’s not very different at all from the key concept of limiting concurrent execution.

First and foremost the weighted semaphore must be brought in as a dependency and imported appropriately.

import “golang.org/x/sync/semaphore”

Next we need to instantiate the weighted semaphore in order to use it.

var sem = semaphore.NewWeighted(int64(10))

Next we need to know how to increment/decrement the semaphore to actually control and limit our concurrency.

// Acquiring is as easy as
sem.Acquire(ctx, 1) // equivalent to sem <- 1 (using channel approach)
// Releasing is as easy as
sem.Release(1) // equivalent to <- sem (using channel approach)

Now let’s highlight some key differences to this implementation. The weighted semaphore additionally has a TryAcquire function which will attempt to acquire room to execute some operation but much differently will not block if the semaphore could not be acquired. This behavior allows your code to perhaps do other things instead of having your code enter a blocked state.

Also, you’ll notice that Acquire, Release, TryAcquire all allow you to specify how many units of concurrency you want to execute against. This is why this implementation is known as a weighted semaphore and gives your code the ability to more finely control concurrent resource usage. Perhaps you want to bound your concurrency against a number of remote RPC calls where not all invocations are equivalent in terms of cost. This is why the weighted semaphore gives more fine grained control.

The last two differences are somewhat minor but important to call out. Primarily the weighted semaphore has the ability to return error states and it also supports the awareness of a context.Context argument. Practically all modern Go code utilizes context.Context and proper error handling as a best-practice with networked calls. I’ll leave it as an exercise to the reader to potentially learn more these practices.

Wrapping up the weighted semaphore let’s pull everything together for a quick view of what the usage might look like with our existing code.

// remoteDeleteEmployeeRPC will delete an employee over the network.
func remoteDeleteEmployeeRPC() {
...
}
// sem is a weighted semaphore allowing up to 10 concurrent operations.
var sem = semaphore.NewWeighted(int64(10))
func main() {
// a context is required for the weighted semaphore pkg.
ctx := context.Background()
for _, employee := range employeeList {
if err := sem.Acquire(ctx, 1); err != nil {
// handle error and maybe break
}
go func(){
remoteDeleteEmployeeRPC(employee.ID)
sem.Release(1)
}()
}
}

Consider for a minute the the use of the weighted semaphore almost parallels the use of a bounded channel. The truth is that the API for using them is nearly identical and they both do a wonderful job of limiting access.

Using a semaphore boils down to the following:

  1. you need a way to instantiate a semaphore with some defined limit.
  2. you need a way to Acquire() every time access into a shared resource is requested
  3. you need a way to Release() every time access into a shared resource is finished

And so this ends Part 1 on semaphores. Do not underestimate the power of controlling your concurrency by limiting it. Don’t let your concurrent goroutines escape you and your system only to bring down another system. Such a tool is a great way to add resiliency to your application and is a very common thing to do.

Well, this wraps up Part 1 of Go’s Extended Concurrency story and if you enjoyed this article and found it remotely useful I’d love to hear from you. If you found this article to be too basic/intermediate or too advanced I’d love to hear from you. If you found this article to be inaccurate or confusing I’d really love to hear from you. Finally, if you learned anything at all please clap and subscribe to my feed as I periodically publish articles such as this.

--

--

Ralph Caraveo

I enjoy coding in languages such as Go, Zig or Rust. Areas of interest: distributed systems, virtual machines and sometimes Game dev.