Multithreading and Multiprocessing in Python

Published in

Analytics Vidhya

4 min readMay 30, 2021

Process vs. Thread

A process is an execution environment of a computer program (e.g. a Python script). Multiple processes can be running the same program, but they can use different data and compute resources.

A thread is a unit of execution in a process. Threads can only execute instructions serially, but a process can have multiple threads running concurrently, taking on different parts of the task.

Global Interpreter Lock (GIL)

The concept of the Global Interpreter Lock (GIL) is crucial to understanding multithreading and multiprocessing in Python. GIL is a process lock that prevents multiple threads from executing simultaneously in a Python process. Even though multiple threads can be running concurrently in a process, only one thread can be executing code at any given time, and the rest must be waiting.

Multithreading

Multithreading means having the same process run multiple threads concurrently, sharing the same CPU and memory. However, because of the GIL in Python, not all tasks can be executed faster by using multithreading. Multiple threads cannot execute code simultaneous, but when one thread is idly waiting, another thread can start executing code.

This is why multithreading in Python is perfect for I/O bound tasks, which are tasks whose execution time is primarily bound by the time spent waiting for input and output. Examples of tasks that can greatly speed up by using multithreading include downloading data from the Internet and writing data to files.

In the example Python code below, both threads are performing an I/O bound task of sleeping for 1 second. By using multithreading, the second task will start without waiting for the first task to finish, and therefore the entire process takes just over 1 second to execute instead of taking 2 seconds.

import time
import threadingdef some_task():
    time.sleep(1)
    print("Finished task")if __name__ == "__main__":
    start = time.time()    # Create two threads
    t1 = threading.Thread(target=some_task)
    t2 = threading.Thread(target=some_task)    # Start running both threads
    t1.start()
    t2.start()    # Wait until both threads are complete, and join the process into a single thread
    t1.join()
    t2.join()    end = time.time()    print(f"Finished process in {end - start} seconds")

Multiprocessing

Multiprocessing is when multiple processes are spawn from the main process, each having its own CPU and memory. Each process also has its own GIL, which means concurrent processes can execute code simultaneously.

Multiprocessing in Python is perfect for CPU bound tasks, which are tasks whose execution time is primarily bound by the speed of the CPU. Tasks that have a high utilization of the CPU can be speed up by using multiprocessing, because the workload is spread among multiple CPUs.

In the example Python code below, both processes are performing a CPU bound task of computing 1+1 a hundred million times. By using multiprocessing, they will execute simultaneously and only take roughly half the time to complete.

import time
import multiprocessingdef some_task():
    for _ in range(100_000_000):
        x = 1 + 1
    print("Finished task")if __name__ == "__main__":
    start = time.time()    # Create two threads
    p1 = multiprocessing.Process(target=some_task)
    p2 = multiprocessing.Process(target=some_task)    # Start running both threads
    p1.start()
    p2.start()    # Wait until both threads are complete, and join the process into a single thread
    p1.join()
    p2.join()    end = time.time()    print(f"Finished process in {end - start} seconds")

concurrent.futures

Python 3.2 introduced the concurrent.futures module that provides a simpler interface to bring together both the threading and multiprocessing modules. It makes use of the ThreadPoolExecutor and ProcessPoolExecutor classes to manage thread and process pools, which share much of the same interface to make switching between multithreading and multiprocessing easier. Interface aside, the concurrent.futures module is conceptually the same as the threading and multiprocessing modules.

Shared Memory and Race Conditions

The process has some global state that can be shared among all threads, and each thread can also have its own local state.

Since threads can share the same global variables, if global variables are accessed by multiple threads concurrently, then it’s important to use locks (a.k.a. mutexes) to prevent a race condition. The class threading.Lock is one way to implement locks, where global variables can be acquired and released by different threads. When a global variable is acquired by one thread, it is locked and cannot be accessed by another thread until that thread releases it.

Different processes cannot share the same global variables (each process actually makes a copy of a global variable if it tries to access it), but if processes need to share data with each other, they can use shared memory queues. The multiprocessing module provides a Queue class that closely resembles Python’s queue.Queue class, which is a FIFO data structure. Different processes can put and get data using multiprocessing.Queue in the same way that a single process can put and get data using queue.Queue, and multiprocessing.Queue uses locks internally on the shared memory, so users don’t have to worry about race conditions when using multiprocessing.Queue.