Dissecting Golang sync.Once

Denis Shilkin
Gett Tech
Published in
4 min readJul 6, 2023

Concurrency is what makes our software more performant and scalable. At the same time, this spawns problems related to synchronisation and orchestration. It’s possible to miss the bottleneck, but the wrong use of concurrency can cause data races, deadlocks, or even memory corruption.

Along with you I’m going to dissect synchronisation primitives kindly provided to us by developers of the Golang standard library.

In this article, I am referring to building blocks such as atomic operations and mutexes, but without going deep into it. I’m also using the term ‘thread’ in a sense of ‘logical thread’ i.e. some flow that can be interrupted by the scheduler in favour of some other ‘logical thread’. These threads may be executed in parallel or sequentially, these threads may be operating system’s threads or coroutines. The nature of these threads makes no difference when it comes to general ideas.

For the sake of simplicity all source code is a Golang-like pseudo code.

sync.Once

Let’s start from a pretty simple synchronisation feature — sync.Once. As it follows from its name it allows you to run some piece of code exactly once.

// to do something exactly once
var once sync.Once

someWork := func(){
foo()
bar()
}

once.Do(someWork) // 1st call executes function
once.Do(someWork) // 2nd call does nothing

When is this useful? For instance, when you want to initialise some common state in a lazy manner.

You can try to come up with your own implementation of this feature.

func Do(f func()) {
// ?
}

I’m pretty sure that most of you would easily propose some straight-forward implementation.

We can check the value of the ‘done’ flag and if it’s equal to zero we execute the target function ‘f’ and then set the flag to one. Otherwise we just quit the ‘Do’ function.

To be more precise we can atomically COMPARE (C) the value of ‘done’ flag AND (A) if it’s equal to zero SET (S) it to one. Compare-And-Set is one of the well-known atomic operations which in itself is out of scope of this article.

// straight-forward implementation v1
func Do(f func()) {
// atomically compare-and-set the value of done
if CAS(done, 0, 1) {
f()
}
}

This seems to be working but it doesn’t. Why is this implementation incorrect? Let’s imagine two threads concurrently calling function Do. Then the full picture may look like this:

We see that the second thread quits the ‘Do’ function BEFORE the ‘f’ has finished. Real ‘Do’ guarantees it returns only AFTER ‘f’ has finished. Indeed if we consider the use-case we started from — the lazy initialisation of some common state — then the second thread is going to use some uninitialised or partially initialised state. Which in the best scenario leads to panic. At worst it leads to some unexpected behaviour further.

Let’s try to implement a synchronisation guarantee that sync.Once provides. Obviously we need to somehow make all except one thread wait until ‘f’ has finished.

// straight-forward implementation v2
func Do(f func()) {
mutex.Lock()
defer mutex.Unlock()

if done == 0 {
defer func() { done = 1 }()
f()
}
}

This is semantically correct. After the first thread locks the mutex all others wait and then quit the ‘Do’ function. Since we came up with a working solution, let’s think if we could make it better. “Better in what sense?” you might be thinking. Now we are coming to the question of performance.

Assume we have some structure ‘Foo’ using some shared state under the hood.

var shared state
var once sync.Once

type Foo struct {}

func (f Foo) Bar() {
// lazy initialisation
once.Do(shared.initialise)

// the use
this = shared.GetThis()
that = shared.GetThat()
}

Now if we’re calling ‘foo.Bar’ concurrently from hundreds or even thousands of threads we are competing for the one single mutex. We keep locking this mutex even after the shared state is initialised. The higher the load, the narrower the bottleneck. Maybe using atomic operations was not such a bad idea? It wasn’t.

Let’s combine performance and correctness.

func Do(f func()) {
// atomically read the value of done
if LOAD(done) == 0 {
doSlow(f)
}
}

func doSlow(f func()) {
mutex.Lock()
defer mutex.Unlock()

if done == 0 {
// atomically change the value of done
defer STORE(done, 1)
f()
}
}

Now ‘done’ means the function ‘f’ is done. All threads getting ‘done’ flag equal to one quit the ‘Do’ function safely. This implementation gracefully handles the case when several threads are trying to change the value of ‘done’ flag simultaneously. All consequent calls will go the fast path so no mutex will be involved.

Practical aspects

Let’s touch on some aspects of using this primitive in a real source code. It may seem obvious but it is worth mentioning.

  1. Do not use sync.Once by default. In most of the cases this means that you’re going to use some global variable which in itself is almost never a good idea since it brings extra coupling and potential side effects. Consider using composition and dependency injection.
  2. If you thought twice and still want to go on with a global variable then make it private and leave a good comment. Incapsulate as much as you can. Being exported these variables will start their own lives outside of your package. This especially hurts when this happens in some reusable library.
  3. Don’t copy sync.Once since it contains an internal flag. Be especially attentive if you use sync.Once as a state of a struct that can be copied.

--

--