Let’s Synchronize Threads in Python
Because synchrony is harmony
It was a magical, “aha!” moment when I first learned about multithreading. The fact that I could ask my computer to do actions in a parallel manner delighted me (although it should be noted here that things don’t happen precisely in a parallel manner on a single core computer. More importantly, they don’t precisely execute in a parallel sense in Python due to the language's Global Interpreter Lock). Multithreading opens new dimensions for computing. But with power comes responsibility.
There are obvious troubles one can imagine with multithreading — many threads trying to access the same piece of data can lead to problems — like making data inconsistent or getting garbled output (like having
HWeolrldo in place of
Hello World on your console). Such problems can arise when we don’t tell the computer how to manage threads in an organized manner.
But how can we ‘‘tell’’ the computer to keep the threads of our program in synchrony? We do so by using synchronization primitives. These are simple software mechanisms that ensure your threads run in a harmonious manner with each other.
This post presents some of the most popular synchronization primitives in Python, defined in its standard
threading.py module. Most of the blocking methods (i.e., the methods which block execution of a particular thread until some condition is met) of these primitives provide the optional functionality of timeout, but I haven’t included it here for simplicity. Also, I’ve just included the principal functionalities of these objects, again for the sake of simplicity. This post assumes you have a basic knowledge of implementing multithreading using Python.
We’ll be learning about
Barriers. Of course, you can construct your own custom synchronization primitives by subclassing these classes. We’ll start with
Locks as they are the simplest primitives and gradually we’ll move on to primitives with more and more sophistication.
Locks are perhaps the simplest synchronization primitives in Python. A
Lock has only two states — locked and (surprise) unlocked. It is created in the unlocked state and has two principal methods —
acquire() method locks the
Lock and blocks execution until the
release() method in some other coroutine sets it to unlocked. Then it locks the
Lock again and returns
release() method should only be called in the locked state, it sets the state to unlocked and returns immediately. If
release() is called in the unlocked state, a
RunTimeError is raised.
Here’s the code which uses a
Lock primitive for securely accessing a shared variable:
This simply gives an output of 3, but now we are sure that the two functions are not changing the value of the global variable
g simultaneously, although they run on two different threads. Thus,
Locks can be used to avoid inconsistent output by allowing only one thread to modify data at a time.
Lock does not know which thread is currently holding the
lock. If the lock is held, any thread that attempts to acquire it will
block, even if the same thread itself is already holding the lock.
In such cases,
RLock (re-entrant lock) is used. You can extend the code in the following snippet by adding output statements for demonstrating how
RLocks can prevent unwanted blocking.
One good use case for
RLocks is recursion, when a parent call of a function would otherwise block its nested call. Thus, the main use for
RLocks is nested access to shared resources.
Semaphores are simply advanced counters. An
acquire() call to a semaphore will block only after a number of threads have
acquire()ed it. The associated counter decreases per
acquire() call, and increases per
release() call. A
ValueError will occur if
release() calls try to increment the counter beyond its assigned maximum value (which is the number of threads that can
acquire() the semaphore before blocking occurs). The following code demonstrates the use of semaphores in a simple producer-consumer problem:
threading module also provides the simple
Semaphore class. A
Semaphore provides a non-bounded counter which allows you to call
release() any number of times for incrementing. However, to avoid programming errors, it’s usually a correct choice to use
BoundedSemaphore, which raises an error if a
release() call tries to increase the counter beyond its maximum size.
Semaphores are typically used for limiting a resource, such as limiting a server to handle only 10 clients at a time. In such a case, multiple thread connections compete for a limited resource (in our example, it is the server).
Event synchronization primitive acts as a simple communicator between threads. They are based on an internal flag which threads can
clear(). Other threads can
wait() for the internal flag to be
wait() method blocks until the flag becomes true. The following snippet demonstrates how
Events can be used to trigger actions.
Condition object is simply a more advanced version of the
Event object. It too acts as a communicator between threads and can be used to
notify() other threads about a change in the state of the program. For example, it can be used to signal the availability of a resource for consumption. Other threads must also
acquire() the condition (and thus its related lock) before
wait()ing for the condition to be satisfied. Also, a thread should
Condition once it has completed the related actions, so that other threads can acquire the condition for their purposes. The following code demonstrates the implementation of another simple producer-consumer problem with the help of the
There are other uses of
Conditions. I think they will be useful when you need to develop a streaming API that notifies a waiting client when a piece of data is available.
A barrier is a simple synchronization primitive which can be used by different threads to wait for each other. Each thread tries to pass a barrier by calling the
wait() method, which will block until all of threads have made that call. As soon as that happens, the threads are released simultaneously. The following snippet demonstrates the use of
Barriers can have many uses; one of them being synchronizing a server and a
client — as the server has to wait for the client after initializing itself.
With that, we have reached the end of our discussion on synchronization primitives in Python. I wrote this post as a solution to an exercise in the book “Core Python Applications Programming” by Wesley Chun.
I’m new to blogging, so constructive criticism is not only welcomed, but very much wanted!