Multithreading & Multiprocessing in Python3

Published in

Mindful Engineering

9 min readApr 9, 2020

What do You need To Know?

Multitasking, in general, is the capability of performing multiple tasks simultaneously, in technical terms, Multitasking refers to the ability of an operating system to perform different tasks at the same time.

For instances, you are downloading something on your PC and Listening to songs and concurrently playing a game all of this is performed by the same Operating System(OS) this is nothing but Multitasking. There are two types of Multitasking in an OS:

Process-Based: Multiple threads running on the same OS simultaneously. Example: Downloading, Listening to songs and playing a game.

Thread-Based: Single process consisting of separate tasks. Example: A game of Vice-City consists of various threads.

Key Concepts:

Threading
Multithreading and Multiprocessing
GIL
Global variable in Multiprocessing
Queue

What is Threading?

BaIt’sn independent flow of execution, a single process can consist of multiple threads each thread in a program performs a particular task.

Example: In the game of Vice-City, A game as a whole is a single process but it consists of several threads responsible for playing music taking input from the user and running the opponent, etc. All these are the separate task that is managed by threads

How to Create Threads in Python?

Firstly import module called threading then we have to define a function which does something that targets the function by using variable

Let’s get started:

import threading
def new():
   for x in range(6):
      print("Child Thread Here!!")
t1=threading.Thread(target=new)

By using start() function we begin the thread also we used join() because if we don’t use join() then it executes the main function first, so by using join it means “wait for a thread/process to complete” then after it goes to the main function.

t1.start()
t1.join()
print("Main Thread Here!!")

How to Achieve Multithreading in Python3?

Multithreading in Python can be achieved by importing the threading module but before importing the module you have to install this module in your respective IDE.

import threading
import time

Before moving towards creating threads let me just tell you, When to use Multithreading in Python?

Multithreading is very useful for saving time and improving performance but it cannot be applied everywhere. In the previous Vice-City example, the music threads are independent, the thread that was taking input from the user. In case these threads were interdependent Multithreading could not be used.

In Real Life, you might be calling web service using API or might be waiting for a packet on your network socket so at that time you are waiting and your CPU is not doing anything. Multi-Threading tries to utilize this idle time and during that idle time, you want to see your CPU do some work accomplished.

So, let’s use Multi-threading to improve the time that will execute the program much faster

How to Create Multithreading in Python3?

If you’ve never seen if __name__ == '__main__': before, it's a way to make sure the code that's nested inside it will only run if the script is run directly (not imported).

import time
import threading


def calc_square(numbers):
    print("Calculate square numbers: ")
    for i in numbers:
        time.sleep(2)  # artificial time-delay
        print('square: ', str(i * i))


def calc_cube(numbers):
    print("Calculate cube numbers: ")
    for i in numbers:
        time.sleep(2)
        print('cube: ', str(i * i * i))


if __name__ == "__main__":
    arr = [2, 3, 8, 9]
    t1 = threading.Thread(target=calc_square, args=(arr,))
    t2 = threading.Thread(target=calc_cube, args=(arr,))
    # creating two threads here t1 & t2
    t1.start()
    t2.start()
    # starting threads here parallel by using start function.
    t1.join()
    # this join() will wait until the cal_square() function is finished.
    t2.join()
    # this join() will wait unit the cal_cube() function is finished.
    print("Successes!")

The reason why I use time delay here just to demonstrate that in which scenario multi-threading could be useful here we are calculating square and then cubes when we are using a time.sleep() here what's happening is CPU ideal is doing nothing for 2 sec.

Why do we need to Use Multi-Processing? How it’s different from Multi-Threading?

Multi-Processing is different programs or processes are running on your computer. The process has there owned virtual memory or an address space now it can create multiple threads inside it.

If the processes have to communicate with each other they use inter-process communication techniques such as a file on the disk a shared memory(Queue) or a message pipe.

the benefit of multi-processing is that error or memory leak in one process won’t hurt the execution of another process

Multi-Threading lives within the same process. Threads will share address space they have their own instruction sets, each threads doing the specific tasks and they will be exhibiting their core they have their stack memory but the only thing they share is their space which means if you have global variable define in your program they can be accessed by the threads

the key difference is threads are lightweight and processes are heavyweight, but if there have any error or memory leak in one thread then it can potentially impact the entire process and it can have a harmful effect on these threads and the parent process all

What is GIL? Why it’s important?

GIL(Global Interpreter Lock) in python is a process lock or a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once.To make sure Python thread-safe it use GIL to prevents race conditions and ensures thread safety and also makes sure that only one thread can access a particular resource at a time.

Multiprocessing allows you to create programs that can run concurrently (bypassing the GIL) and use the entirety of your CPU core. The multiprocessing library gives each process its own Python interpreter and each their own GIL. If you want to make use of multiple CPU cores in your application, use the multiprocessing module instead.

Because of this, the usual problems associated with threading (such as data corruption and deadlocks) are no longer an issue. Since the processes don’t share a memory, they can’t modify the same memory concurrently.

Python’s Global Interpreter Lock

The CPython(the standard python implementation) garbage collector uses an efficient memory management technique known as reference counting.But the problem is that the reference count variable is prone to race conditions like any other global variable. To solve this problem, the developers of python decided to use the GIL. The GIL prevents two threads from executing simultaneously in the same program. However, two threads can run concurrently and one can run code while another may be waiting.

The GIL limits parallel programming in Python out of the box.

What is Multi-Processing?

Multiprocessing has the ability of a system to support more than one processor at the same time. In multiprocessing, processes are spawned by creating a Process object and then calling its start() method.

import time
import multiprocessing


def calc_square(numbers):
    for i in numbers:
        time.sleep(3)  # artificial time-delay
        print('square: ', str(i * i))


def calc_cube(numbers):
    for i in numbers:
        time.sleep(3)
        print('cube: ', str(i * i * i))


if __name__ == "__main__":
    arr = [2, 3, 8, 9]
    p1 = multiprocessing.Process(target=calc_square, args=(arr,))
    p2 = multiprocessing.Process(target=calc_cube, args=(arr,))
    # creating two Process here p1 & p2
    p1.start()
    p2.start()
    # starting Processes here parallel by using start function.
    p1.join()
    # this join() will wait until the calc_square() function is finished.
    p2.join()
    # this join() will wait unit the calc_cube() function is finished.
    print("Successes!")

if you want to check that processes are running, then open your task manager and go to details there you can see that there have 3 python processes is running parallel y so, three processes are main process and 2 sub-processes.

Global Variable

Instead of printing the result lets store the results in a global variable. Here we have imported a multiprocessing module and then we have created a global variable called results=[]. Then we have created a function that calculates the square of a list of numbers.

let’s get started:

import multiprocessingresults = []   #Creating a Global Variabledef calc_square(numbers):
   global results
   for i in numbers:
      print('square: ', str(i*i))
      results.append(i*i)
      print('witnin a result: '+str(results))

here we are creating another process p1 that just goes through the list of numbers arr=[2,3,8,9] into the child classcalc_squarewhere it moves in the loop and calculating the square of the list of numbers then stored into a global variable results=[].

if __name__ == "__main__":
    arr = [2,3,8,9]
    p1 = multiprocessing.Process(target = calc_square,args=(arr,))
    # creating one Process here p1    p1.start()
    # starting Processes here parallel by using start function.    p1.join()    # this join() will wait until the calc_square() function is    finished.    print('result : '+str(results))
#this print didn't work here we have to print it within the process    print("Successed!")

when we create a new process then what happens this forms a copy of existing results(global variable). Here a copy of results is different and results don’t move back to that global variable so we have to fetch data within the process then it works.

When results in global variable get copied to another process p1, address space creates its copy hence the result is separate so whenever you want to communicate between two processes or shared data(Queue) you need to use one of the techniques

Every process has its own address space(virtual memory). Thus program programs variable is not shared between two processes. You need to use inter-process communication(IPC) techniques if you want to share data between two processes

Sharing Data Between Processes Using Queue

As we know that Multiple processes have their own address space they don’t share the address space that results in the problem, so for sharing the data between processes you need to use some techniques.

A queue is shared memory. So we are going to use a queue to stores the results of calculating a square function.

A queue has a method called put now if you have studied data-structure this is the queue data-structure which is the FIFO(First-In-First-Out) data-structure where you insert data at the end of the queue and you pull the data from the front of the queue.Below is the list of operations that are used to manage Queue:

get(): To get items from queue.
put(): To insert an items to queue.
qsize(): To find the number of items in a queue.
empty(): It returns a boolean value depending upon whether the queue is empty or not.
full(): It returns a boolean value depending upon whether the queue is full or not.

Here we are inserting the data at the end of the queue

let’s get started:

import multiprocessingdef calc_square(numbers, q):   #child-function
   for i in numbers:
      q.put(i*i)print('inside process : '+str(results))

let’s create a variable q using Perl multi-processing Queue class so you will import multiprocessing module first and then you will say multiprocessing.Queue now q will be passed to calculate square function.

if __name__ == "__main__":   #main function
    arr = [2,3,8,9]
    q = multiprocessing.Queue()
    p1 = multiprocessing.Process(target = calc_square,args=(arr, q))    p1.start()
    p1.join()    while q.empty() is False:
        print(q.get())

After this process is done, we’re going to check while q.empty() is a method that tells you if a queue is empty of not. What we are doing here we are iterating through the queue and getting elements one by one from the front of the queue and printing them.

NOTE: Python Queue and Multiprocessing Queue

Python has a module called queue and the queue the module is different from the multiprocessing queue so here we have a difference between the two.

When using multiprocessing module we use multiprocessing.Queue()and for python queue module we use queue.Queue().

Conclusion:

The program completes much faster with multiprocessing at approximately 4 sub-processes running concurrently.
If your code has a lot of I/O or Network usage, multithreading is your best bet because of its low overhead.
If your code is CPU bound, you should use multiprocessing (if your machine has multiple cores)
Multiprocessing here was helpful for this CPU intensive task because we could benefit from using multiple cores and avoid the global interpreter lock.

For further queries, you can connect on LinkedIn

Happy Coding!