Limiting simultaneous tasks using distributed semaphores

Published in

Picus Security Engineering

10 min readDec 6, 2023

Definition of distributed semaphore

A distributed semaphore is a synchronization mechanism that enables processes or threads in a distributed system (i.e., spread across multiple machines or nodes) to coordinate access to a shared resource or to synchronize their actions.

The “distributed” keyword might suggest otherwise but a distributed semaphore can be implemented using a centralized approach (like a single server managing the semaphore) or a decentralized approach (like a consensus protocol among multiple nodes).

The “distributed” aspect of the semaphore refers to the distributed nature of the processes or threads that are coordinating or synchronizing their actions via the semaphore, not necessarily the location or structure of the semaphore’s implementation.

If multiple processes or threads across different machines or nodes are coordinating their access to shared resources through a semaphore, and that semaphore is managed by a centralized unit (e.g., a Redis instance), then the semaphore serves a distributed purpose and can be termed a “distributed semaphore.”

Application scenario

In the context of an application utilizing a microservice architecture, automatic scaling is a common feature. This scaling makes it challenging to rely solely on the resources of individual containers to limit access and prevent the exhaustion of system resources. To address this, a third-party application becomes essential for managing these resources effectively. By implementing a distributed semaphore through such an application, we can efficiently enforce limitations on task execution. This approach ensures that as the system scales, resource allocation remains controlled and balanced, safeguarding the system’s stability and performance.

Possible approaches

There are various ways to implement a distributed semaphore. But I will examine 3 common approaches in this blog post.

1-) Databases with transactions

You can implement the idea by using any database with transactions. But to give a concrete example for better understanding, I will discuss PostgreSQL here.

Step 1: Define the Semaphore Table

We’ll create a table to represent the semaphores and individual permits.

Here, each row represents a single permit. The acquired column indicates whether the permit is currently held, and acquired_at notes the timestamp when it was acquired.

Step 2: Initialize the Semaphore

Assuming you want a semaphore called ‘my_semaphore’ with 5 permits, you would insert five rows like this:

This uses generate_series to create multiple rows in a single command.

Step 3: Acquiring a Permit

To acquire a permit with a timeout, you’d check for an available permit (not acquired) or a permit that was acquired longer than the timeout period ago.

This transaction tries to update a permit that is either not acquired or has been acquired for more than the timeout period (5 minutes in this example). It returns the permit_id of the permit it managed to acquire. We also need a row-level lock in order to prevent race conditions.

Step 4: Releasing a Permit

To release a permit, you would set acquired to false:

Step 5: Handling Timeouts (Cleanup Job)

You can create a cleanup job that resets any permits that have been acquired for too long without being manually released:

You would run this job periodically, perhaps as a scheduled task within the application, or as a cron job, or even as a scheduled event within PostgreSQL.

2-) Custom solutions with message queues

Implementing a distributed semaphore using message queues is a bit unconventional since message queues are typically used for asynchronous message passing rather than for synchronization primitives like semaphores. However, it’s not impossible to leverage the properties of message queues to simulate a semaphore’s behavior. Especially if you already use a message queue in your infrastructure and don’t want to maintain additional applications for the sole purpose of using distributed semaphores.

I will use RabbitMQ as an example in this section since it is one of the most widely used message queue solutions. With RabbitMQ, you could use a queue to represent the semaphore. Each “permit” could be a message in the queue. Acquiring a permit would involve consuming a message from the queue, and releasing a permit would involve publishing a message back to the queue.

Step 1: Initialize the Semaphore

Create a queue in RabbitMQ. This will be your semaphore queue. The durability of the queue should be set to true to survive broker restarts.
Publish messages to the queue. Each message represents a permit. For a semaphore with N permits, publish N messages to the queue. These messages should also be marked as persistent if you want the permits to survive a broker restart.

Step 2: Acquire a Permit

Start by consuming a message from the semaphore queue. This act is equivalent to acquiring a semaphore permit. A consumer takes one permit by reading a message from the queue.
Proceed with the execution of your critical section, the part of your code that requires controlled access facilitated by the semaphore. The permit you’ve acquired ensures that this section is accessed in a controlled and safe manner.
After you have completed the critical section, acknowledge the consumed message. This acknowledgment is crucial as it informs RabbitMQ that the permit has been effectively used. In case of a failure before the acknowledgment, the unacknowledged message can be handled as outlined in Step 4.

Step 3: Release a Permit

Publish a message back to the semaphore queue. When the work for which the permit was acquired is complete, the consumer should send a message back to the semaphore queue, effectively returning the permit for others to use.

Step 4: Handling Failures

Implement a dead-letter exchange with a message TTL. This is necessary to handle situations where a consumer might fail to process a message and does not acknowledge it. The message TTL will ensure that if a message is not acknowledged within a certain time frame, it will be republished to the queue, thus releasing the permit back into the pool.
Configure the queue to use the dead-letter exchange. This tells RabbitMQ where to send messages that are not acknowledged within the TTL.
You should configure TTL long enough here. If a consumer acknowledges a message late and releases the permit after given TTL, the same message can be added to the queue by both the consumer and the dead-letter exchange, resulting in an increased number of resources.

Step 5: Consumer Cleanup

Consumers should be set up to cancel the consume operation if they terminate, which would trigger the message to be requeued or sent to the dead-letter exchange, depending on the message acknowledgment status.

3-) Distributed caching systems

Distributed caching is an essential component in modern distributed systems, used to improve performance by reducing the load on databases and speeding up data retrieval. There are many caching systems that we can use to implement a distributed semaphore structure. Redis is particularly well-suited for this task due to its high performance and support for atomic operations, which are crucial for ensuring the consistency and reliability of the semaphore in a distributed environment

We’re going to examine two approaches with two different Redis commands, ZSET and BLPOP

Building a distributed semaphore with ZSET:

ZSET Operations

ZSET is a data structure in Redis that holds a sorted set of strings, each associated with a floating-point score.
Atomicity of ZSET Operations: Operations on ZSET, such as ZADD, ZREM, ZRANK, ZRANGE, etc., are atomic. For example, when you add or remove elements from a sorted set, the entire operation is performed as a single, indivisible step.

This is an approach proposed in a book named Redis in Action by Josiah L. Carlson. ZSETs allow us to store multiple semaphore holders in a single structure. For each process that attempts to acquire the semaphore, we’ll generate a unique identifier. This identifier will be a member of a ZSET. For the score, we’ll use the timestamp for when the process attempted to acquire the semaphore. And the entries in the ZSET will be ordered by this timestamp.

Semaphore structure with ZSET — Semaphore implementation with ZSET

Semaphore acquisition:

First, generate an identifier and add it to the ZSET using the current timestamp.
Check identifiers rank.
If the rank is lower than the total permits, then the caller has acquired a semaphore.
Otherwise, the caller doesn’t have the semaphore and must delete its identifier from the ZSET.

To handle timeouts, before adding an identifier to the ZSET, we first clear out any entries that have timestamps that are older than our timeout number value. To release the semaphore, simply remove the identifier from the ZSET.

This is a very easy and fast solution. However, in the case of multiple hosts with different system times, it can cause problems. For example, if we had two systems A and B, where A ran 10 milliseconds faster than B, then if A got the last semaphore, and B tried to get a semaphore within 10 milliseconds, B would actually “steal” A’s semaphore without A knowing it. This is called an unfair semaphore.

To minimize problems with inconsistent system times, we’ll add a counter and a second ZSET. The counter creates a mechanism that ensures that whoever increments the counter first should be the one to get the semaphore.

We make sure that the clients who are first in line to get the semaphore actually receive it. We do this by using a counter value as a score in the second ZSET. Then, we check where each client’s identifier ranks in this ZSET to decide which client gets the semaphore.

We continue to handle timeouts the same way as our basic semaphore, by removing entries from the system time ZSET. We propagate those timeouts to the new owner ZSET.

Building a distributed semaphore with BLPOP:

BLPOP

BLPOP is a blocking list pop operation. It removes and returns the first element of a list, or blocks until one is available.
Atomicity of BLPOP: When BLPOP is executed, it atomically pops an element from the list. If the list is empty, the client is blocked until another client performs an LPUSH or RPUSH operation on the list. The whole process is atomic, meaning no other Redis operation can interfere between the checking of the list's state and the popping of the element or the blocking of the client.

The basic idea is to use list elements as semaphore tokens: the presence of an element in the list represents an available permit, and the absence of elements means all permits are taken. Also, this approach is better to serve clients fairly since clients blocked on BLPOP command are served in a FIFO manner.

We can store randomly generated UUIDs as resources. Whenever a client takes a resource from the list, it appends the timestamp of current time at the end of the resource and adds it to a second locked resources list. This second list will be necessary to prevent clients from holding the semaphore indefinitely. Timestamp added token is returned to the client and after being done with it, the client can release the semaphore by giving the provided token as a parameter to unlock function. This way, we can find relevant token from the locked resources list, strip the timestamp, and put the original UUID back in the resources list. To handle timeouts, before a client attempts to take a resource from the list, it should check if there are any expired resources by comparing timestamps at the locked resources list with the current timestamp. Since expired tokens are released from the locked resources list, a client releasing an expired semaphore won’t have any effect since no matching key will be present in the locked resources queue. This will ensure that the number of the total resources will always remain the same.

1. Semaphore Initialization

Create a List: Initialize a Redis list with a predefined number of elements, where each element represents a semaphore permit.
Create Locked Resources List: This list will be empty initially. New elements are added to this list at each lock operation.

2. Acquiring a Permit

To acquire a permit, clients use the BLPOP command on the semaphore list. BLPOP is a blocking operation that pops the first element of the list or blocks until an element is available.
Specify a timeout with BLPOP to prevent the client from waiting indefinitely.

3. Releasing a Permit

LPUSH/RPUSH Command: To release a permit, a client pushes an element back into the list using LPUSH or RPUSH.
Adding an element to the list will unblock one of the clients waiting on BLPOP, effectively passing the permit to that client.

Which method to choose?

The selection of an appropriate method for implementing distributed semaphores is contingent upon specific project requirements and circumstances. Overall, distributed caching systems often emerge as the preferable option, primarily due to their efficiency in comparison to database operations. While message queue solutions can be viewed as a viable alternative, they are often considered a makeshift response to the limitations of database systems.

However, the most critical factor in this decision-making process is the alignment with the existing infrastructure. Utilizing already deployed and familiar systems not only streamlines development but also optimizes resource utilization. In our specific context, our codebase is predominantly developed in Go, with Redis and PostgreSQL being integral components of our existing infrastructure. In light of this, we have chosen to adopt an approach based on Redis BLPOP, as demonstrated by an open-source Go project redis-semaphore-go. This decision was influenced by our existing technological ecosystem, ensuring that our solution is both efficient and cohesive with our current systems.

Conclusion:

In summary, this blog post has discussed how distributed semaphores are essential for managing shared resources in distributed systems. We explored three approaches: using databases with transactions, message queues, and distributed caching systems, each with its own benefits and challenges. The choice depends on the specific needs of the project and the existing technology infrastructure.

For our project, we chose Redis BLPOP for its efficiency and compatibility with our Go-based codebase. This decision highlights the importance of selecting a method that aligns with both the project requirements and the existing technological setup.