CodeX
Published in

CodeX

Parallel Programming in Python — Lesson 1. Introduction

This is the first in a series of lessons, covering some facilities that the Python programming language offers for parallel programming and the motivation for using each of them. The series concentrates on the use of parallelism as a software design idiom, (as in real-time and system applications), intentionally avoiding technical usage of parallelism, such as performance optimization (which is widely covered elsewhere).

Sections in this lesson:

1. The Motivation for Event-driven, Parallel and concurrent Design

Software spends most of its time consuming input in order to produce output. Algorithms live somewhere in the middle (and give insight on the transformation. We have little use for algorithms that do nothing, leaving everything behind as they found it).

Not surprisingly, the prevailing programmatic model is of a processor that consumes a sequence of discrete inputs, one at a time, leaving behind a trail of outputs. Each time the processor consumes input, it consults its current state what to do with it, possibly resulting in output, and possibly switching to a different internal state (containing the wisdom what to do next, with this and other input). This programmatic model is rigorous, simple, easy to implement and reliable: it will always work exactly as expected. But it has one problem: it is based upon the assumption that input is blocking. (Precisely: processing it blocks anything else). It is synchronous. Pending input is suspended until the processor is through with the current input. The processor does not handle two inputs at the same take (among other things, because it is governed by one internal state).

Unfortunately, this model does not always agree with the real, chaotic world in which we happen to live, where many inputs coexist and do not necessarily block! Consequently, the job of designing programmatic solutions to real world problems often involves an additional challenge: to find smart ways to come to terms with our legacy synchronous infrastructure. Precisely: How to implement the processing of non-blocking input, given a synchronous processing infrastructure.

The obvious solution, reached independently by many programmers and wearing various specific forms, is the discontinued process: to somehow divide the processing among so many synchronous partial-processes. (And how to control each of these, in turn, is something that we already know). The design challenge of identifying the synchronous process parts is often surprisingly easy. The genuine problem is how to coordinate them. Or precisely, how to synchronize them, since this is all about timing. It is important to note that synchronization is rarely about partial processes unfolding at the same time, but rather about controlling each partial process to proceed (and suspend) at exactly the right timing, contributing to unfold the exact use case scenarios we have in mind.

Once the relationship between inputs and outputs (and the transformations involving them) have been analyzed, we are left with the design challenge of orchestrating so many partial synchronous procedures (typically, programmatic functions), each responsible for producing some output and/or consuming some input — with parallelism being the exception, rather than the rule! (Because it introduces complexity, and that is to be avoided unless strictly necessary!) In the trivial case, the design consists of so many challenges of the form: how to couple the input of function B with the output of function A? With luck, this challenge is solved by the introduction of function C that invokes A and then B (or calling B at the end of A, or calling A at the start of B), with the I/O somehow channeled between them. Where this straightforward, synchronous solution works, then by all means, take it, no parallelism needed! parallel programming devices are required when, for various reasons, this will not do. To find that out, we must analyze the complexity of the relationship between producers (of output) and consumers (of input). But there is no need to rush. Often, what may appear to the naked eye to invite parallelism, may turn out to be solved trivially in a synchronous process. Consider for example the “game loop” challenge.

A “first person shooter” video game presents a three-dimensional maze, in which so many monsters are progressing, each in its own pace, and they are shot at by the user (whose back we see), also progressing in own pace, with projectiles going all over (also in their own pace). In addition, these moving objects can collide and with much ceremony. One would expect the design to consist of dozens of threads (one for each monster, projectile and the user), but this is rarely necessary. It would be, had the main design challenge been how to coordinate the movement of so many moving objects. But this is a relatively technical issue, involving conventional geometry and physics, that we know how to solve procedurally. This would, still, be the case, had the problem domain been linear, but it is not! With the current computer video technology, this problem domain is digitized in quanta of frames (say, 20 per second or more). In this problem domain, the moving elements are not required to output motion, but rather the exact opposite: still frames (precisely, their part of the general frame, and in the correct covering order). So, the problem is reduced to calculating the position of each moving element by the time of the next frame (“game tick”), detecting collisions, solving them and accumulating the next frame in the correct covering order (and on time). (In a more sophisticated solution, the moving objects accumulate their data into a three-dimensional database, e.g. the “opengl” library, and the frame is rendered from there). So the design challenge is reduced to so many producers (emanating partial frames) that are blocked by a single consumer (accumulating the next frame). And this bottle-neck topology dictates the design. Luckily, in this problem domain, the inputs are flexible (their preparation is completely determined by the game tick) so — in the absence of good reason otherwise — parallelism is not needed. All frame-contributing functions may be invoked sequentially from a game-loop function, of course, accounting for the added complexity of collisions and their consequence.

Indeed, many discontinued processes do require an event-driven design, but not parallelism. The archetypal example is GUI (Graphic User Interface — i.e. windowing) systems, where the user is allowed to manipulate the screen in any order, but the software that responds to these events (obviously, consisting of so many independent event handler functions) lives in a single thread. In this problem domain, the software is not required to respond to any two events at once! (While such an alternative architecture could — arguably — improve user experience, it is not worth the expense! Window system users are used to have some of their output ignored. They will try again…)

Event-driven design is prerequisite for parallel design, for at least the following reasons:

Let us make sure that we know what we are talking about, with a brief reminder of event-driven design, using registration of callbacks.

Here is a simple “scrolling menu” example (written in Python 3.9, comments below):

Notes (corresponding to numbered comments in code above):

Output:

1. Print 
2. Edit
3. Erase
99. exit
select:
1
Print logic to come here...
99

2. The “Terminator UI” example

And now, to a problem that genuinely requires parallel design: the simple Stop-compile UI. In this case, interrupting another process (which may be doing something at the very same time) is the name of the game!

Problem: The user shall be capable to stop the compile process, within acceptable delay, via UI. The UI (for Human Interface) is displayed by the compiler, allowing the user to abort the compilation. Note that the compiler’s logic is given. Although we may have to make some adjustment to compiler code, but we do not intend to rewrite the compiler to suit the UI! So, we have two functional requirements (compilation and key-press leading to abort) that seem pretty simultaneous. Are they indeed?

Proposed solution: The user is posed with the option to “press any key to stop…”, following this prompt. On the other end, the compiler is provided with an “abort” flag. Pressing any key signals (turns on) the abort flag. (While the correct way to do this is through GUI, displaying a button for the user to push, the present solution is purposely downgraded, to simplify the code example).

Obviously, the “stop compile UI” problem calls for a real parallel solution, because it involves a producer (terminator) and consumer (compiler) progressing in different resolutions! The terminator responds to its input (key press) immediately. But the compiler responds to its relevant input (abort signaled) in its own time, which is completely out of the control of the Terminator. Obviously, a procedural solution will not do: the Terminator cannot control the compiler, the compiler cannot control the Terminator and both cannot be controlled from above! In other words, this problem involves input that must not block. It is asynchronous.

The following use-case scenario must unfold:

The required use-case scenario

Notes:

Of course, we cannot implement this design literally using procedural code! The first step (compiler initialization) always occurs at the start of the use case. The fourth step (compiler finalization) always comes in the end and the third step (compiler-functions execution loop) In the middle. The second step (terminator initialization) may come either before or after compiler initialization (but before the compiler-function loop). But we have a problem positioning the main function of the second part: signaling abort may occur anytime within the third step. (Actually, it may also occur before the end of compiler initialization and during finalization. Doing that would do the user no good, but the program must be ready to cope with it). This analysis leaves no other choice, but to deploy both functions to separate threads of control!

So, here is a parallel design: It features the same required functions, but injects synchronization. Note that it makes the somewhat counter-intuitive decision to invoke the terminator in the main thread and launch the compiler in a side thread. This design decision acknowledges a limitation imposed by Python’s implementation of standard I/O that favors the main thread.

A parallel design

Notes:

Output:

Press any key to abort compilation! 
Dummy Compiler v1.0
initializing.....
parsing.....
building memory database.....
checking dependencies.....
generating code.....
finalizing.....
Compilation successful!
Press [enter] to close
Press any key to abort compilation!
Dummy Compiler v1.0
Initializing.....
Parsing....
Aborted by user!
Press [enter] to close

3. Timer Thread

Deploying the compiler to a minor thread and the terminator to the main thread has functional justification (the terminator buffers the application from the compiler, by blocking it until the latter terminates, one way or another). Still, it is rather counter-intuitive! One would expect the compiler, playing the main role, to occupy the main thread, and the terminator, playing a side and disposable role, to be exiled to a side thread. Alas, the implementation platform is standing in our way! But nothing stands in the way of the simple timer, which we are going use as a more intuitive minor thread example.

Here is a Python implementation (discussion follows):

Output:

Snoring... 
Snoring...
Snoring...
Snoring...
Snoring...
Snoring...
Snoring...
Snoring...
Snoring...
Snoring...
Oops, woke up!

Admittedly, from a programmatic point of view, this algorithm could as well be written without a timer, by simply iterating on snoring and computing how long is left. However, this makes less sense in the problem domain. You cannot require the Sleeper to consult his/her watch and make computations every second while sleeping. The impulse to wake up must come from the outside! (But it is correct to expect the Sleeper to snore while sleeping…). The current design is more realistic, and therefore, extensible, scalable, etc.

In the next lesson, we are going to study the Python thread in detail. Next, we shall meet the classical Producer/Consumer example and iterate on implementing it using various Python facilities, from “primitive” to advanced.

Lessons in this course:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Avner Ben

Born 1951. Active since 1983 as programmer, instructor, mentor in object-oriented design/programming in C++, Python etc. Author of DL/0 design language