Python for Youngsters

(And Anyone Else Who Wants to Learn Programming)

Gregory Terzian
Python for Youngsters
11 min readMar 23, 2023

--

Part 3: Processes and Concurrency

Introduction to the Third Part

Up to now, we have run code in the interactive shell. We will now start to run code through a file of source code, and as a separate Python process.

3.1 A Python Process

From your terminal, if you are in the Python interactive shell, now is the time to exit it.

exit()

You are now back in your terminal shell. Type the following command, and press enter.

touch main.py

This creates a file named main.py, in the current directory.

Now, open the file with any text editor, copy the below code, paste it into the file, and save the changes.

class WordCombiner:
def __init__(self, spacing, final_punctuation):
self.final_punctuation = final_punctuation
self.spacing = spacing
self.words = []

def add_word(self, word):
self.words.append(word)

def combine_words(self):
combined_words = self.spacing.join(self.words)
return combined_words + self.final_punctuation

combiner = WordCombiner(", ", ".")
combiner.add_word("one")
combiner.add_word("two")
combiner.add_word("three")
print(combiner.combine_words())

This is the familiar code of our word combiner. We can execute this code by, from the terminal, typing python main.py and hitting return.

Python will start a process, run the code in the file, and finally quit the process, returning control to the terminal window. Before the Python process exits, you should see “one, two, three.” printed in the terminal.

This works fine, however we can improve it: we can organize our Python files as if part of a real project.

Let’s create a second file, named word_combiner.py. From the terminal, type the following, and press return.

touch word_combiner.py 

You can view the files in the current directory with the following command.

ls

If you type it and hit return, you should see two files: main.py, and word_combiner.py.

First, cut all the code in main.py, and paste it into word_combiner.py, leaving main.py empty. Then, in main.py, add the below code.

from word_combiner import WordCombiner

combiner = WordCombiner(", ", ".")
combiner.add_word("one")
combiner.add_word("two")
combiner.add_word("three")
print(combiner.combine_words())

Here, we import the WordCombiner class into our main.py file. Another name for file of Python source code is a module. Our project now has two modules, and one imports code from the other.

You can now run this code as before, by giving the following terminal command.

python main.py

You should see the following printed.

one, two, three.
one, two, three.

Why do we see this printed twice? Because, when you import code from one module to the other, all the code in the module from which you import is run. Here, we only want to import the WordCombiner, we do not actually want to run the code in word_combiner.py that creates an instance and combines words. There is a way to prevent that code from running. In word_combiner.py, replace the code below the WordCombiner class with the following.

if __name__ == "__main__":
combiner = WordCombiner(", ", ".")
combiner.add_word("one")
combiner.add_word("two")
combiner.add_word("three")
print(combiner.combine_words())

And now, run the following command again.

python main.py

You should see “one, two, three.” printed only once.

We have introduced the use of a conditional in our code: the if statement, which will only allow the code indented code block below it to run if the condition is true. It is another of saying: only run this code if the __name__ attribute of the module equals the string “__main__”.

For example, if you run the word_combiner.py module, the code will run.

python word_combiner.py

When you call the “python” command with a module, Python will set the __name__ attribute of that module to “__main__”. This allow you, by using the conditional as we did above, to only run code when it is found in the module given directly to the “python” command.

Therefore, even though we know that we want to run the code in main.py, and there is no other code in that file, it is good practice to put that code into a conditional block. So, let’s rewrite our code as a professional would.

In word_combiner.py, only keep the below.

class WordCombiner:
def __init__(self, spacing, final_punctuation):
self.final_punctuation = final_punctuation
self.spacing = spacing
self.words = []

def add_word(self, word):
self.words.append(word)

def combine_words(self):
combined_words = self.spacing.join(self.words)
return combined_words + self.final_punctuation

In main.py, change the code to the below.

from word_combiner import WordCombiner

if __name__ == "__main__":
combiner = WordCombiner(", ", ".")
combiner.add_word("one")
combiner.add_word("two")
combiner.add_word("three")
print(combiner.combine_words())

With this setup, when running the python main.py command, Python will import the code of our combiner of words, and make it available for use in the code in main.py, which will run and print “one, two, three.” to the terminal. As importing makes clear, Python modules are a bit like classes and functions: a way to organize and share code.

So far, everything happened in a single Python process, running one instruction at a time. We can also create multiple processes, directly from our Python code.

3.2 Multiple Python Processes

From our main.py module, we will now start a second Python process, and in that process we will run the code that combines words. Change the code in main.py to the the below.

from word_combiner import WordCombiner
from multiprocessing import Process


def combine_words(words):
combiner = WordCombiner(", ", ".")
for word in words:
combiner.add_word(word)
print(combiner.combine_words())

def main():
p = Process(target=combine_words, args=(["one", "two", "three"],))
p.start()

if __name__ == "__main__":
main()

There are four new things in this code.

One, we defined a new main function, and call it inside the bottom conditional code block. Remember that each function’s body has its own scope? Any variable we define outside of a function becomes part of the global scope: it becomes available to all the code in the module. The below example shows the problem.

def use_global_variable():
print(p)

if __name__ == "__main__":
p = "this is global"
use_global_variable()

The p variable, because it is defined in the global scope — outside of any function or class — is available from within the use_global_variable function(without having to pass the variable as an argument to it). That situation makes it easy for a programmer to make a mistake by using the wrong variable. We can prevent future confusion by defining a main function that comes with its own scope: good programming prevents future programmers, including your future self, from making mistakes.

Two, we import a class named Process, from a module called multiprocessing. Where is this code coming from? The answer is: from the Python standard library. As you can organize your code with modules, and import what you need in one from the other, so you can also import code from the standard library — see it as a bundle of code that comes baked in with the language. The standard library is an extensive collection of code for common tasks: communicating over the internet, working with sound, processing text, making calculations, and much more. Above, we use the multiprocessing module. As it names suggests, it is used to work with Python processes.

Three, we introduce a new builtin class: the tuple. Similar to the list, it is a type of sequence, but it is immutable: once you’ve created an instance, it cannot be changed. Above, we create an instance of a tuple with a single element, a list of three strings: ([“one”, “two”, “three”],). The trailing comma is necessary to avoid confusing Python; such is the tyranny of programming languages.

Four, we use the Process class, and create an instance by calling the class with two arguments: target, and args. The target argument must be something that can be called: a function or a class. The args argument is a tuple of arguments, which will be passed as arguments to the target. In our case, we want to pass a single argument, a list of three strings, to the combine_words function. By calling the start method of the Process instance, a new process is started and our target, the combine_words function, is called from this new process with a single argument: a list of strings.

Now run the following command again.

python main.py

You should see the familiar word combination printed to the terminal. Even though we have created a separate Python process to run the word combination code, the result is the same as if we just did everything in one process. While the process running the combine_words function worked, the main process idled. What if multiple processes could communicate, and work together?

3.3 Communicating Processes

from word_combiner import WordCombiner
from multiprocessing import set_start_method, Process, Queue


def combine_words(in_queue, out_queue):
"""Combines words received via `in_queue`, sends the result on `out_queue`.

This function runs in a separate process.
"""
combiner = WordCombiner(", ", ".")

# Loop, receiving a word on a queue,
# and adding it to the words to be combined,
# until a stop signal is received.
while True:
word = in_queue.get()
if word == Stop:
break
combiner.add_word(word)

# Combine words, and send the result on a queue.
result = combiner.combine_words()
out_queue.put(result)

class Stop:
"""An empty class, used as a stop signal for queue consumers."""
pass

def main():
# Configure sub-processes to be forked, a configuration detail.
set_start_method("fork")

# Create three queues:
# - Two `in_queue`'s, one for each process to receive words.
# - One `out_queue`, for the main process to receive word combination results.
first_process_queue = Queue()
second_process_queue = Queue()
out_queue = Queue()

# Start the first process.
first_process = Process(target=combine_words, args=(first_process_queue, out_queue))
first_process.start()

# Start the second process.
second_process = Process(target=combine_words, args=(second_process_queue, out_queue))
second_process.start()

# Create two lists of words.
first_list = ["one", "two", "three", "four", "five"]
second_list = ["six", "seven", "eight", "nine", "ten"]

# Send each word in the first list to the first process,
# each word in the second list to the second process,
# in a single loop iterating over the two lists zipped together.
for first, second in zip(first_list, second_list):
first_process_queue.put(first)
second_process_queue.put(second)

# Send the stop signal to both processes.
first_process_queue.put(Stop)
second_process_queue.put(Stop)

# Get the results and print them.
print(out_queue.get())
print(out_queue.get())

if __name__ == "__main__":
main()

Our new piece of code being the most complicated in this book, there are a full five new things to discuss.

One, we configure processes to start using the fork method. This is a configuration detail that you can safely ignore for now.

Two, we start using a Queue class, imported from the multiprocessing module. This queue is similar to a list, but it can be shared between processes. Items can be added to the queue by one process, using the put method, and taken out by another process, using the get method. Ordering of items put and gotten out is, unlike the list, based on the principle of “last in, first out” — a queue is a communication device between processes. Above, we use one queue to to send items — words to be combined — from the main process to two separate subordinate processes; another queue is used to send the resulting combination of words back to the main process.

Three, we add a new class called Stop. This class contains no methods: it does nothing. Its only use is that of a signal, sent on a queue, telling the subordinate processes to stop their work. We could have used something else, for example the string “STOP”, but this would have meant that this particular string could never be included in one of our word combination. The Stop class has no use besides that of a unique thing that the code logic understands as a signal to stop.

Four, we use a while loop; it is another form of iteration: one that continues until the expression after while is false, or until told to stop by the break statement. In the above code, we iterate— perhaps until infinity since True never becomes false — receiving a word at each iteration, until the item received on the queue is the Stop class, at which point we break out of the loop and continue below it, where the words are combined.

Five, we start commenting and documenting our code. Documentation are the two strings explaining how something works: one for the combine_words function, and one for the Stop class. The triple quotations enables these strings to span multiple lines. The comments are the sentences preceded by #. These are written alongside the code, explaining what it does. Documentation and comments are usually written in English, the lingua-franca of programmers, and should be written and punctuated as such. Documenting your code is useful: it can make it even easier for the reader to understand what is going on; besides, documentation can also be extracted by automatic tools and turned into online web pages. But, documentation is not an escape hatch for complicated code, neither is it a place for an outpouring of your opinions and feelings. You should write code that is as clear as possible, and then make it even easier for the reader to follow along through short and factual documentations and comments.

Congratulations, you have now begun to understand your first multiprocess program. Concurrent computing — which includes multiprocessing, and its siblings of multithreaded and distributed computing — has a bright future. If you’ve bought a computer the last few years, you must have realized that what keeps increasing — both in number and diversity — are the processing units, the so-called cores, of the machines. For a program to take advantage of these cores, it must be designed to run concurrently. Ever wondered why that spinning wheel still shows up on that brand new power-horse of yours? Because the design of the program failed to take advantage of all the power at hand.

But, is making the program run faster not an optimization, something that can be easily done later, as I said in the introduction? No, because in this case the performance benefits emerges from the design itself. A well-designed concurrent program will, on a machine with a single core, run about as fast — or even slower — as the equivalent sequential program. However, once the concurrent program is run on a multi-core machine, it will outrun the sequential one. Code optimization is a local effort: a given for-loop can be restructured, a data structure can be swapped for a more efficient one. Design is about the system as a whole. Artificial intelligence will be able to rewrite a for-loop for you, and make it run faster; the design of the system will be left for humans to toil with.

The benefits of concurrent programs go beyond performance alone. For a few examples, let’s go back to our program combining words. By using multiple processes, and by running a word combiner in each process, we can, in theory, perform the word combining work faster. But, by separating the work from the main process, we also ensure that the program remains responsive. Let’s say that the program came with a user interface: if we did the word combination work right in the main process, the user interface would freeze each time some work was done. By moving the work to other processes, the main process can remain responsive to user input. By using multiple processes, our program would also be able to survive failure of parts. If one of the word combining processes were to crash, the program as a whole could continue to run. Finally, if the combining of words was a matter of life and death, we could use multiple processes to catch hardware failures: run the same code in multiple processes, perhaps on different machines, and only accept results agreed on by a majority of processes(this is done in space shuttles).

Armed with your newfound knowledge of the basics of programming, you can start adding to it, using all that is available online. To ensure progress, do something a little bit harder, but not too hard, at each step of the way. On your journey, whenever in doubt, open a Python shell and type “import this” — the words of wisdom that will appear on your screen should put you back on the right track.

>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

--

--

Gregory Terzian
Python for Youngsters

I write in .js, .py, .rs, .tla, and English. Always for people to read