Parallel Programming in Python — Lesson 1. Introduction

Avner Ben
Avner Ben
Jul 30 · 14 min read

This is the first in a series of lessons, covering some facilities that the Python programming language offers for parallel programming and the motivation for using each of them. The series concentrates on the use of parallelism as a software design idiom, (as in real-time and system applications), intentionally avoiding technical usage of parallelism, such as performance optimization (which is widely covered elsewhere).

Sections in this lesson:

  1. The Motivation for Event-driven, Parallel and concurrent Design
  2. An example: “Terminator UI”
  3. An example: “Timer” Thread

Software spends most of its time consuming input in order to produce output. Algorithms live somewhere in the middle (and give insight on the transformation. We have little use for algorithms that do nothing, leaving everything behind as they found it).

Not surprisingly, the prevailing programmatic model is of a processor that consumes a sequence of discrete inputs, one at a time, leaving behind a trail of outputs. Each time the processor consumes input, it consults its current state what to do with it, possibly resulting in output, and possibly switching to a different internal state (containing the wisdom what to do next, with this and other input). This programmatic model is rigorous, simple, easy to implement and reliable: it will always work exactly as expected. But it has one problem: it is based upon the assumption that input is blocking. (Precisely: processing it blocks anything else). It is synchronous. Pending input is suspended until the processor is through with the current input. The processor does not handle two inputs at the same take (among other things, because it is governed by one internal state).

Unfortunately, this model does not always agree with the real, chaotic world in which we happen to live, where many inputs coexist and do not necessarily block! Consequently, the job of designing programmatic solutions to real world problems often involves an additional challenge: to find smart ways to come to terms with our legacy synchronous infrastructure. Precisely: How to implement the processing of non-blocking input, given a synchronous processing infrastructure.

The obvious solution, reached independently by many programmers and wearing various specific forms, is the discontinued process: to somehow divide the processing among so many synchronous partial-processes. (And how to control each of these, in turn, is something that we already know). The design challenge of identifying the synchronous process parts is often surprisingly easy. The genuine problem is how to coordinate them. Or precisely, how to synchronize them, since this is all about timing. It is important to note that synchronization is rarely about partial processes unfolding at the same time, but rather about controlling each partial process to proceed (and suspend) at exactly the right timing, contributing to unfold the exact use case scenarios we have in mind.

Once the relationship between inputs and outputs (and the transformations involving them) have been analyzed, we are left with the design challenge of orchestrating so many partial synchronous procedures (typically, programmatic functions), each responsible for producing some output and/or consuming some input — with parallelism being the exception, rather than the rule! (Because it introduces complexity, and that is to be avoided unless strictly necessary!) In the trivial case, the design consists of so many challenges of the form: how to couple the input of function B with the output of function A? With luck, this challenge is solved by the introduction of function C that invokes A and then B (or calling B at the end of A, or calling A at the start of B), with the I/O somehow channeled between them. Where this straightforward, synchronous solution works, then by all means, take it, no parallelism needed! parallel programming devices are required when, for various reasons, this will not do. To find that out, we must analyze the complexity of the relationship between producers (of output) and consumers (of input). But there is no need to rush. Often, what may appear to the naked eye to invite parallelism, may turn out to be solved trivially in a synchronous process. Consider for example the “game loop” challenge.

A “first person shooter” video game presents a three-dimensional maze, in which so many monsters are progressing, each in its own pace, and they are shot at by the user (whose back we see), also progressing in own pace, with projectiles going all over (also in their own pace). In addition, these moving objects can collide and with much ceremony. One would expect the design to consist of dozens of threads (one for each monster, projectile and the user), but this is rarely necessary. It would be, had the main design challenge been how to coordinate the movement of so many moving objects. But this is a relatively technical issue, involving conventional geometry and physics, that we know how to solve procedurally. This would, still, be the case, had the problem domain been linear, but it is not! With the current computer video technology, this problem domain is digitized in quanta of frames (say, 20 per second or more). In this problem domain, the moving elements are not required to output motion, but rather the exact opposite: still frames (precisely, their part of the general frame, and in the correct covering order). So, the problem is reduced to calculating the position of each moving element by the time of the next frame (“game tick”), detecting collisions, solving them and accumulating the next frame in the correct covering order (and on time). (In a more sophisticated solution, the moving objects accumulate their data into a three-dimensional database, e.g. the “opengl” library, and the frame is rendered from there). So the design challenge is reduced to so many producers (emanating partial frames) that are blocked by a single consumer (accumulating the next frame). And this bottle-neck topology dictates the design. Luckily, in this problem domain, the inputs are flexible (their preparation is completely determined by the game tick) so — in the absence of good reason otherwise — parallelism is not needed. All frame-contributing functions may be invoked sequentially from a game-loop function, of course, accounting for the added complexity of collisions and their consequence.

Indeed, many discontinued processes do require an event-driven design, but not parallelism. The archetypal example is GUI (Graphic User Interface — i.e. windowing) systems, where the user is allowed to manipulate the screen in any order, but the software that responds to these events (obviously, consisting of so many independent event handler functions) lives in a single thread. In this problem domain, the software is not required to respond to any two events at once! (While such an alternative architecture could — arguably — improve user experience, it is not worth the expense! Window system users are used to have some of their output ignored. They will try again…)

Event-driven design is prerequisite for parallel design, for at least the following reasons:

  • It shares a design pattern with parallelism: the callback. It involves the paradigm shift of treating a “function” (sequence of capabilities to be fulfilled) as an object, that may be referenced by a variable, a parameter (i.e. sent to another function for deferred invocation), or a function return (the result of another function invocation). One registers Event-handler functions at a central dispatcher, which runs the event loop where, in each cycle, the next event is discovered (if any — e.g. mouse press in the GUI loop, game tick in the game loop) and the appropriate handler function for the event (if indeed registered) is invoked, usually blocking.
  • Often, careful analysis of a problem that seems to suggest parallelism turns out to do with event-driven design (in a single thread). The obvious examples are the GUI message pump and and “game loop” discussed above. (In the latter, the game loop may be implemented by registration of frame contributor functions. This “open” design allows to add and remove moving object types in each release, as well as to add and remove moving object instances during run time, without having to rewrite switch/case constructs or maintain the burden of complex logic in the wrong place!
  • In addition, dispatch mechanisms may be used to implement a “soft” version of (as if) parallelism, where the latter is not really needed but its interface is! See the Python async facility, to be discussed in another lesson.

Let us make sure that we know what we are talking about, with a brief reminder of event-driven design, using registration of callbacks.

Here is a simple “scrolling menu” example (written in Python 3.9, comments below):

Notes (corresponding to numbered comments in code above):

  1. The doMenu function takes a list of functions (“callable” objects), each taking no arguments (indicated by the empty square brackets) and returning nothing.
  2. An event loop repeats virtually forever (unless stopped from inside, by the appropriate closing event or an exception).
  3. The menu takes the option titles from the provided functions — their doc strings. A function, like any object, features member functions and attributes. The doc string (triple-quoted string at top of function body) is kept in the “member data” of the function object.
  4. The exit option gets procedural treatment. It is not implemented by a callback (though such alternative design is possible and legitimate), because its number (99) is intentionally out of bounds. (So, extra logic is required, one way or the other).
  5. The dispatcher attempts to detect (and, in this case, actively obtain) the next event: precisely: a number, typed at the keyboard.
  6. Invalid input (non-numeric) is identified and safely disposed.
  7. The exit event (99) gets procedural treatment. The loop terminates.
  8. More invalid input (numeric, but out of range) is identified and safely disposed.
  9. The callback indexed by the numeric selection is retrieved into the variable “callback”.
  10. The callback is called. We do not have to know that the “called” object is actually a function. Any object that responds to the function call (round brackets) operator will do!
  11. The explicit variable is there just for convenience. The same code may be expressed straightforward (giving some headache to the uninitiated): options[selection - 1]()
  12. A menu function. It is a normal Python function. Nothing indicates that it is going to be used as someone’s callback.
  13. Functions passed as arguments (collected in a list) to the menu function. In Python there is no need to take the address of the function or “wrap” it in any way. The function, which was declared in module scope, is registered in the module’s dictionary (obtained by the built-in “globals”) as an object by that name. (Only the name, no argument types or immutability. Python does not support function overloading).


1. Print 
2. Edit
3. Erase
99. exit
Print logic to come here...

And now, to a problem that genuinely requires parallel design: the simple Stop-compile UI. In this case, interrupting another process (which may be doing something at the very same time) is the name of the game!

Problem: The user shall be capable to stop the compile process, within acceptable delay, via UI. The UI (for Human Interface) is displayed by the compiler, allowing the user to abort the compilation. Note that the compiler’s logic is given. Although we may have to make some adjustment to compiler code, but we do not intend to rewrite the compiler to suit the UI! So, we have two functional requirements (compilation and key-press leading to abort) that seem pretty simultaneous. Are they indeed?

Proposed solution: The user is posed with the option to “press any key to stop…”, following this prompt. On the other end, the compiler is provided with an “abort” flag. Pressing any key signals (turns on) the abort flag. (While the correct way to do this is through GUI, displaying a button for the user to push, the present solution is purposely downgraded, to simplify the code example).

Obviously, the “stop compile UI” problem calls for a real parallel solution, because it involves a producer (terminator) and consumer (compiler) progressing in different resolutions! The terminator responds to its input (key press) immediately. But the compiler responds to its relevant input (abort signaled) in its own time, which is completely out of the control of the Terminator. Obviously, a procedural solution will not do: the Terminator cannot control the compiler, the compiler cannot control the Terminator and both cannot be controlled from above! In other words, this problem involves input that must not block. It is asynchronous.

The following use-case scenario must unfold:

The required use-case scenario


  1. Exactly how to do this (listen) is to be defined. (And what is the problem?)
  2. Non-blocking output. The Terminator writes the stop-flag and does stop to wait for anyone to read it.
  3. Non-blocking input. The compiler consults the stop-flag if it is there, not stopping to wait for anyone to write it.

Of course, we cannot implement this design literally using procedural code! The first step (compiler initialization) always occurs at the start of the use case. The fourth step (compiler finalization) always comes in the end and the third step (compiler-functions execution loop) In the middle. The second step (terminator initialization) may come either before or after compiler initialization (but before the compiler-function loop). But we have a problem positioning the main function of the second part: signaling abort may occur anytime within the third step. (Actually, it may also occur before the end of compiler initialization and during finalization. Doing that would do the user no good, but the program must be ready to cope with it). This analysis leaves no other choice, but to deploy both functions to separate threads of control!

So, here is a parallel design: It features the same required functions, but injects synchronization. Note that it makes the somewhat counter-intuitive decision to invoke the terminator in the main thread and launch the compiler in a side thread. This design decision acknowledges a limitation imposed by Python’s implementation of standard I/O that favors the main thread.

A parallel design


  1. The compiler is launched in a parallel thread of control.
  2. The requirement to poll the stop-flag is implicit.
  3. The compiler actually signals the terminator to stop listening to the keyboard. (The compiler is already stopping.)
  4. The Terminator blocks the main program.
  5. The iteration is required enable the Terminator itself to be terminated prematurely (e.g. by the compiler ending normally). The other option (to sit down and wait for the key to be pressed, signal abort and go home), would freeze the program in the case of normal termination.
  • Here is a Python implementation (discussion follows):
  1. IDE’s (such as IDLE, pyCharm, Wing etc.) replace the standard input and output with their own console, which is unlikely to behave as expected here. Even if you use an IDE to save this code, you must run it from the operating system console (or double click on its entry in the file explorer)!
  2. The program imports the threading library
  3. Python does not support non-blocking keyboard input, so we resort to a third-part library msvcrt (Microsoft C runtime), available in the python MS-Windows implementation. (The UNIX implementation is somewhat more complex, involving select, so we skip it for convenience).
  4. The compiler abort flag is implemented, in this simple example, as a global variable. (In a more robust design, one would expect it to be encapsulated in the Compiler object and accessed through a getter).
  5. The terminator lives in the terminate function. It iterates on polling the keyboard for input, in 1 second resolution, querying kbhit (is the keyboard hit?).
  6. The terminator prepares for the possibility that the compiler is no longer active (having already signaled the abort flag on its own, before leaving).
  7. Its job done, the terminator signals abort and returns, releasing the main program to end.
  8. But before departing, the terminator “eats” the character pressed by the user (to prevent it from being echoed unexpectedly when the user finally presses enter).
  9. The do-nothing represents the various compiler functions, for the sake of this simple example. It outputs progress indication for every second, five times in a row, unless it finds out that abort has been signaled, which makes it stop and return prematurely.
  10. The dot output is flushed to the screen. (By default, the standard output waits for enter or some input to issue the entire dot sequence at once). This is why the standard output is used directly, rather than through the built-in print.
  11. Our dummy compiler goes through a sequence of six activities, fulfilled by the do-nothing function.
  12. The compiler iterates through its sequence of activities, guarded by the abort flag.
  13. Compiler functions done, and in the lack of user intervention, the compiler itself signals the abort flag, to release the terminator from blocking the program.
  14. Context-aware prompts are issued from the main program. The compiler knows about the abort flag, which is enough. What makes it tick (key press, in the present case) is beyond the compiler’s scope (and susceptible to change!). It is in the scope of the main program, which ties all ends together.
  15. A Python thread takes a function and, optionally, arguments to pass to it. The thread is as yet inactive (the function is yet to be invoked). The thread function is invoked when the thread object is explicitly requested to start.
  16. Once the compiler is allowed to proceed on its own, the terminator is invoked, blocking the main thread. It will return when the compiler is done, either normally or by user abort (which does not concern the main program). Note that starting the compiler thread does not guarantee that the thread function has already been invoked when the terminator starts. But this does not matter (in this case). The user may press the keyboard even before the compiler exists. Still, the terminator will do its job (giving the compiler a short life indeed, when it finally starts).
  17. Finally, proper keyboard input (enter-terminated) is requested, and for a number of reasons: (1) to keep the terminal window from closing prematurely in some runtime and debug configurations that involve this, and (2) to give the compiler thread time to exit gracefully.


Press any key to abort compilation! 
Dummy Compiler v1.0
building memory database.....
checking dependencies.....
generating code.....
Compilation successful!
Press [enter] to close
Press any key to abort compilation!
Dummy Compiler v1.0
Aborted by user!
Press [enter] to close

Deploying the compiler to a minor thread and the terminator to the main thread has functional justification (the terminator buffers the application from the compiler, by blocking it until the latter terminates, one way or another). Still, it is rather counter-intuitive! One would expect the compiler, playing the main role, to occupy the main thread, and the terminator, playing a side and disposable role, to be exiled to a side thread. Alas, the implementation platform is standing in our way! But nothing stands in the way of the simple timer, which we are going use as a more intuitive minor thread example.

  • Problem: A “Timer” implements the requirement to do something after a given time interval (or on a given absolute time), and regardless of what the requester is doing in the mean time.
  • Proposed solution: The obvious programmatic implementation is a thread taking a “callback” and interval. The thread’s job is rather straightforward: it sleeps the given interval, invokes the callback and expires.

Here is a Python implementation (discussion follows):


Oops, woke up!
  1. In this simple example, the flag to signal is, again, global.
  2. The timer thread waits for the specified time and invokes the callback. (Note that this optimistic design does not allow to cancel the timer prematurely. A more realistic implementation would iterate on sleeping for a quantum of time, polling the exit condition).
  3. All the callback does is to signal the exit condition.
  4. The Sleeper turns on the alarm clock before going to sleep (naturally).
  5. The Sleeper iterates on snoring until signaled to wake up.

Admittedly, from a programmatic point of view, this algorithm could as well be written without a timer, by simply iterating on snoring and computing how long is left. However, this makes less sense in the problem domain. You cannot require the Sleeper to consult his/her watch and make computations every second while sleeping. The impulse to wake up must come from the outside! (But it is correct to expect the Sleeper to snore while sleeping…). The current design is more realistic, and therefore, extensible, scalable, etc.

In the next lesson, we are going to study the Python thread in detail. Next, we shall meet the classical Producer/Consumer example and iterate on implementing it using various Python facilities, from “primitive” to advanced.

Lessons in this course:

  1. Introduction (you are here!)
  2. The Thread
  3. Synchronization Primitives (Multi-threading)
  4. Synchronization Primitives (Multi-processing)
  5. Cooperative Processing — synchronous
  6. Cooperative Processing — asynchronous


Everything connected with Tech & Code. Follow to join our 500K+ monthly readers