Understanding Concurrency and Parallelism in Python: A Comparative Guide with Java and C++

Sejal Jagtap
Python’s Gurus
Published in
6 min readJun 20, 2024
This image is the cover of the book ‘Learning Concurrency in Python’ by Elliot Forbes

They say Python does not support true multithreading, yet there exists a Python library for threading. Recently, I got to work on concurrent log handling, where we created a library for a concurrent log handler. This experience inspired me to write about what I understood, hoping it might be helpful for others because, oh boy, concurrency is really a cool thing!

Now, I’m still on this learning rollercoaster, so if I’ve missed a few loops(pun intended :p) or twists here, let’s discuss! It would be so so cool if this article helps folks out there who are scratching their heads over this topic, just like I was.

Multithreading in Python (? or .)

First, let’s clear up the confusion about Python’s multithreading. Python indeed has a threading module, which allows the creation and management of threads. However, due to the Global Interpreter Lock (GIL), only one thread executes Python bytecode at a time. This means that Python's threading module doesn't provide true parallelism for CPU-bound tasks. It does, however, work well for I/O-bound tasks, where threads can handle tasks like reading files, network operations, and other activities that involve waiting for external resources.

What is Concurrency? How is it different from Parallelism?

It’s essential to distinguish between concurrency and parallelism:

  • Concurrency means dealing with many things at once.
  • Parallelism means doing many things at once.

In Python, concurrency can be achieved through threading, multiprocessing, and asynchronous programming. While threading deals with concurrent execution, multiprocessing bypasses the GIL and achieves parallelism by using separate memory spaces.

Creating a Concurrent Log Handler

For my gig, the main use case was handling a Python app that runs across multiple processes of the same script and even multiple hosts connected by a shared network drive. We needed to write all log events to a central log file for each and rotate these logs based on size and/or time (like daily or hourly rotations).

Our Concurrent Log Handler featured a QueueHandler and QueueListener setup to log asynchronously in the background. This way, the thread or process making the log statement doesn’t have to wait for the logging to complete.

A short snippet of the basic version of a concurrent log handler we implemented.

from logging import getLogger
from BaseClass import ConcurrentFileHandler
import os

log = getLogger(__name__)
logfile = os.path.abspath("dealer_gamma_log.log")
# Rotate log after reaching 512K, keep 5 old copies.
rotateHandler = ConcurrentFileHandler(logfile)
log.addHandler(rotateHandler)


log.info("Here is a very exciting log message, just for you")

We used the asyncio background logging feature under the hood. When a logging statement is made, it’s added to a background queue and isn’t written immediately and synchronously. This queue can span multiple processes using multiprocessing or concurrent.futures, and file locking ensures it works seamlessly across multiple hosts.

How Do Concurrency and Parallelism Compare Across Programming Languages Like Python, Java, and C++?

Now, let’s dive deeper into how Python, Java, and C++ manage concurrency and parallelism.

Python

Concurrency in Python:

  • Threading: The threading module is great for I/O-bound tasks but doesn't give you true parallelism due to the GIL. Think of it as multitasking but with a bit of a bottleneck.
  • Asyncio: The asyncio module is perfect for handling lots of I/O-bound tasks at once without the overhead of threading. It uses an event loop to switch between tasks, making it efficient.

Parallelism in Python:

  • Multiprocessing: This module creates separate memory spaces and bypasses the GIL, making it ideal for CPU-bound tasks. Each process runs independently, so you can truly run tasks in parallel.
# Example of multiprocessing
import multiprocessing

def worker(num):
print(f'Worker: {num}')
if __name__ == '__main__':
processes = []
for i in range(5):
p = multiprocessing.Process(target=worker, args=(i,))
processes.append(p)
p.start()
for p in processes:
p.join()

Java

Java does a great job with both concurrency and parallelism:

Concurrency in Java:

  • Threading: Java has built-in support for threading. The Thread class and the java.util.concurrent package make it easy to create and manage threads.
  • Executors: The Executors framework takes it up a notch by managing a pool of worker threads, so you don’t have to handle the details yourself.

Parallelism in Java:

  • Fork/Join Framework: This framework is awesome for breaking down tasks into smaller chunks, processing them in parallel, and then combining the results. It’s like having a team that divides and conquers the work.javaCopy code
// Example of ForkJoinPool
import java.util.concurrent.RecursiveTask;
import java.util.concurrent.ForkJoinPool;

class Sum extends RecursiveTask<Integer> {
private final int[] array;
private final int low;
private final int high;
Sum(int[] array, int low, int high) {
this.array = array;
this.low = low;
this.high = high;
}
protected Integer compute() {
if (high - low <= 10) {
int sum = 0;
for (int i = low; i < high; i++) {
sum += array[i];
}
return sum;
} else {
int mid = (low + high) / 2;
Sum left = new Sum(array, low, mid);
Sum right = new Sum(array, mid, high);
left.fork();
int rightResult = right.compute();
int leftResult = left.join();
return leftResult + rightResult;
}
}
public static void main(String[] args) {
ForkJoinPool pool = new ForkJoinPool();
int[] array = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
Sum sumTask = new Sum(array, 0, array.length);
int result = pool.invoke(sumTask);
System.out.println("Sum: " + result);
}

C++

C++ is super powerful for both concurrency and parallelism, thanks to its low-level capabilities.

Concurrency in C++:

  • Threading: C++11 introduced a solid threading library. You can create and manage threads easily with the <thread> header.
  • Async Tasks: Using <future> and <async>, you can run tasks asynchronously, making concurrency more manageable.

Parallelism in C++:

  • Parallel Algorithms: C++17 brought parallel algorithms, allowing you to execute standard algorithms in parallel.
  • OpenMP: OpenMP is a popular API for parallel programming, letting you write parallel code that runs on multicore processors.
// Example of threading in C++
#include <iostream>
#include <thread>
#include <vector>

void worker(int id) {
std::cout << "Worker: " << id << std::endl;
}

int main() {
std::vector<std::thread> threads;
for (int i = 0; i < 5; ++i) {
threads.emplace_back(worker, i);
}
for (auto& th : threads) {
th.join();
}
return 0;
}

So, in C++, creating threads is straightforward, and the ability to manage them manually gives you a lot of control. Plus, C++’s power with manual memory management and direct hardware interaction makes it an excellent choice for high-performance applications.

Key Takeaways

Each language handles concurrency and parallelism in ways that play to its strengths. Here’s a quick wrap-up:

1.Python:

  • Concurrency: Use threading for I/O-bound tasks and asyncio for highly efficient asynchronous I/O operations.
  • Parallelism: Use multiprocessing to bypass the GIL for CPU-bound tasks, allowing true parallel execution.

2.Java:

  • Concurrency: Use Thread and Executors for easy-to-manage multithreading. Executors are particularly handy for managing thread pools.
  • Parallelism: The Fork/Join framework is great for breaking tasks into smaller chunks and processing them in parallel.

3.C++:

  • Concurrency: Utilize <thread> and asynchronous tasks with <future> and <async>. The low-level control in C++ is powerful for managing concurrency.
  • Parallelism: Take advantage of parallel algorithms introduced in C++17 and use OpenMP for writing parallel code on multicore processors.

Conclusion

While Python’s GIL limits true parallelism in multithreading, the multiprocessing module provides a workaround for CPU-bound tasks. Java and C++ offer more straightforward and powerful tools for concurrency and parallelism. Understanding the strengths and limitations of each language's concurrency model is crucial for choosing the right tool for the job.

If you’re working primarily with Python and need to handle CPU-bound tasks concurrently, exploring the multiprocessing module is highly recommended. For more intensive or performance-critical applications, Java or C++ might be more suitable due to their robust threading and concurrency support.

Again, I would love to discuss/hear any inputs from you and I would be happy to inculcate those changes in this article. Happy Coding guys!

Python’s Gurus🚀

Thank you for being a part of the Python’s Gurus community!

Before you go:

  • Be sure to clap x50 time and follow the writer ️👏️️
  • Follow us: Newsletter
  • Do you aspire to become a Guru too? Submit your best article or draft to reach our audience.

--

--

Sejal Jagtap
Python’s Gurus

Software Engineer | AWS SAA | NYU | Capstone Investment Advisors | HSBC | Fidelity National Information Services