Spin lock in Modern C++ with atomics, memory barriers and exponential back-off

7 min readMar 6, 2023

This article is a fast and a deep dive in low level concurrency

Locking mechanisms are an essential part of concurrent programming, allowing multiple threads to access shared resources without interfering with each other. One popular type of lock is the spin lock, which uses busy-waiting to repeatedly check if a lock is available. However, this approach can waste precious cpu cycles if the lock is looping all the time and consuming the cpu, as it can result in a lot of wasted CPU cycles. To address this issue, we can use an approach called exponential backoff, which uses a progressively increasing delay to avoid wasting resources.

In this article, we’ll implement our own toy spin lock with exponential backoff. We’ll start by discussing the basic idea behind spin locks, the problem of busy-waiting, then we’ll introduce the concept of exponential backoff and discuss how it can improve the efficiency of spin locks. Further we will discuss what atomics are and what they’re used for, followed by an explanation of what are memory barriers as they work in tandem.We’ll then go through a sample implementation of the spin lock with exponential backoff and discuss its advantages and limitations and finally code a driver test program to check things are working. Let’s get started!

The Spin Lock

A spin lock is a type of lock that uses busy-waiting to repeatedly check if a lock is available. The basic idea is simple: when a thread wants to access a shared resource, it checks if the lock is available. If the lock is not available, the thread keeps spinning in a loop and checking the lock until it becomes available.

While spin locks can be efficient in some scenarios, let’s repeat this again as they can be wasteful and inefficient if the lock is held for very short periods of time. In this case, the thread can waste a lot of CPU cycles spinning in a loop and checking the lock, which can lead to poor performance and high power consumption.

The Problem of Busy-Waiting

Busy-waiting is a common problem in concurrent programming, where a thread repeatedly checks a condition in a loop and wastes CPU cycles while waiting for the condition to become true. In the case of spin locks, busy-waiting occurs when a thread repeatedly checks the lock in a loop and wastes CPU cycles while waiting for the lock to become available.

Busy-waiting can lead to poor performance, high power consumption, and reduced responsiveness. To address this issue, we can use a technique called exponential backoff.

Exponential Backoff

Exponential backoff is a technique used in concurrent programming to reduce the amount of time a thread spends waiting for a condition to become true. The basic idea is to progressively increase the amount of time a thread waits after each unsuccessful attempt, rather than repeatedly checking the condition in a loop.

In the context of spin locks, exponential backoff can be used to reduce the amount of time a thread spends spinning in a loop and checking the lock. Instead of repeatedly checking the latter, the thread waits for an increasing amount of time after each unsuccessful attempt, which can reduce power consumption and improve performance.

Atomics

Atomics are a type of object in concurrent programming that allow multiple threads to access and modify a variable in a thread-safe way. They provide the necessary synchronization and memory barriers (see below for an explanation of what they’re) to ensure that all threads see a consistent view of shared memory.

In C++11 and later versions of the language, the std::atomic template is provided to make it easy to use atomics. Here's an example of how to use an atomic int variable:

In this example, we use an atomic int variable called counter to keep track of a shared count. We then define a function called increment_counter() that increments the counter by one. We create three threads that each call increment_counter(), and then we join the threads and display the value of the counter.

The std::atomic template provides a variety of methods to access and modify atomic variables, such as load(), store(), fetch_add(), and compare_exchange_strong(). These methods ensure that operations on atomic variables are performed atomically and with the necessary memory barriers to ensure consistency between threads.

Memory barriers

A memory barrier (also known as a memory fence) is a type of hardware or software synchronization mechanism that ensures that all memory accesses on a particular processor are executed in a specific order. They are used to enforce sequential consistency, which is a consistency model in concurrent programming that ensures that the order of operations in a program is the same for all threads.

Memory barriers are used to prevent the reordering of memory access instructions by a processor or compiler, which can lead to inconsistent behavior in concurrent programs. For example, if two threads are accessing a shared variable, a memory barrier can ensure that the order of the read and write operations is consistent across all threads.

There are several types of memory barriers, which provide different levels of synchronization and ordering guarantees. The most common types of memory barriers are:

Compiler Barriers: Compiler barriers are instructions that are inserted by a compiler to prevent the reordering of code. They ensure that the code is executed in the order in which it is written, in order to prevent incorrect behavior. They’re typically used to optimize code and improve performance, but they can also be used to prevent concurrency issues in certain cases.
Processor Barriers: Processor barriers are instructions that are issued by a processor to ensure that memory operations are executed in a specific order. They prevent the processor from executing instructions out of order, which could lead to incorrect behavior in a concurrent program. Processor barriers are typically implemented using special instructions or cache coherency protocols.
Store Barrier: A store barrier (also known as a write barrier) is a memory barrier that ensures that all memory stores before the barrier are visible to other threads. Store barriers prevent a processor from reordering memory stores, which could result in a write operation being executed after a read operation in another thread.
Load Barrier: A load barrier (also known as a read barrier) is a memory barrier that ensures that all memory loads after the barrier are visible to the current thread. Load barriers prevent a processor from reordering memory loads, which could result in a read operation being executed before a write operation in another thread.
Full Barrier: A full barrier (also known as a total barrier or memory fence) is a memory barrier that ensures that all memory operations before the barrier are visible to other threads, and all memory operations after the barrier are visible to the current thread. Full barriers provide the strongest level of synchronization and ordering guarantees, and are typically used when maximum synchronization is required.

In C++11 and later, memory barriers are provided by the std::memory_order enum, which is used with atomic operations to provide the necessary synchronization and memory barriers. The std::memory_order enum provides several options for memory ordering, including memory_order_relaxed, memory_order_acquire, memory_order_release, memory_order_acq_rel, and memory_order_seq_cst.

Here’s an example of how to use memory barriers with atomic operations:

In this example, we use the fetch_add() method with std::memory_order_relaxed to increment the counter variable. The std::memory_order_relaxed option indicates that no memory barriers are required, as we don't need to ensure any specific ordering or synchronization between threads.

Implementation of Spin Lock with Exponential Backoff

Here’s a sample implementation of the spin lock with exponential backoff:

In this implementation, we use std::atomic_flag as the underlying storage for the lock, which provides atomicity and ensures that multiple threads can access and modify the flag without interfering with each other.

We use the test_and_set() method with std::memory_order_acquire to acquire the lock, which provides a memory barrier that ensures all previous writes are visible to the thread before it acquires the lock. We then use clear() with std::memory_order_release to release the lock, which provides a memory barrier that ensures all subsequent writes are visible to other threads after the lock is released.

In the backoff() method that is called in our lock() method within the body of our while loop, until we don’t reach our hard-coded number of maximum retries, we relish our current quote of cpu time by simply yielding our time slice by using std::this_thread::yield(). When the max_retries is reached, we will increasely wait a certain number of microseconds before trying again. Bear in mind since I assume you’re not using a real-time operating system, this is just a time approximation.

Testing our Spin lock

We code a driver program in which we create 100 threads and perform 10000 operations per thread in which we increment a counter and check at the end that the counter value is equal to the number of threads times the number of interactions.

Conclusion

In this article, we implemented our own spin lock, discussed atomics and memory barriers, the problem of busy-waiting in concurrent programming and introduced the concept of exponential backoff as a solution to reduce the amount of time spent waiting for a condition to become true. We’ve seen that exponential backoff uses progressive backoff delays to improve efficiency and reduce power consumption and finally coded a test driver to check for the correctness of our implementation.

While spin locks with exponential backoff can be more efficient than simple spin locks in some scenarios, they can still be less efficient than “normal” locks like mutexes in scenarios where the lock is held for long periods of time or if many threads are contending for the same lock. As always, the choice of locking mechanism depends on the specific use case and the expected duration of the lock.

Please bear in mind this code is not meant to be used in production environments and it’s just an educational implementation. Getting these things right is very tricky as you can guess. Anyway hope you enjoyed or learned something new! I did too! Happy coding and if you have any comment, positive or negative leave it below! Thank you!

Note: C++ Concurrency in Action, 2nd Edition by Anthony Williams is a great book to get your hands dirty and try to understand this difficult topic