Parallelism(and concurrency) in Node.js and Python and Golang and comparison them — Part 1

Saeed Oraji
Analytics Vidhya
Published in
8 min readMar 28, 2020

When we are talking about parallel and concurrent programming, it looks a bit difficult to learn and properly use. You need to learn many topics, it can be scary to learn, I want to talk about its spots, This article enables you to use properly when you want to use parallel and concurrent programming, what topics you should be aware of? and how can you write an efficient program?. in the end, we are looking at it in Node.js and Python and Golang then compare it

Let’s take a brief look at topics that we are going to talk about them:

  • Multiprocessors structure
  • Process
  • Threads
  • Coroutines
  • Parallel vs Concurrency
  • Memory organization
  • Parallel programming models
  • Parallelism(and concurrency) in Node.js
  • Parallelism(and concurrency) in Python
  • Parallelism(and concurrency) in Golang

Multiprocessors structure

First, let’s take a look at CPU

The central processing unit is responsible for executing processes. in the low level, each process has specific commands and sends them to CPU for executing(probably you say that CPU pick them and run, this is your idea :) ).

Fetch: CPU fetches the data and instructions from a register(kinda memory)

Decode: Decodes the instructions

Execute: The instruction is carried out on the data and the result of the operation is stored in another register

Each processor can execute the instruction regarding its category. Flynn taxonomy defines multiprocessors in four sections

  • SISD
    The processor can handle one instruction and a piece of data at a time. this structure doesn’t support parallel programming
Single Instruction, Single Data
Single Instruction, Single Data
  • SIMD
    This structure is proper for computing parallel data structures like array and list.
  • MISD
Multiple Instruction, Single Data
Multiple Instruction, Single Data
  • MIMD
    Very much suitable for parallelism
Multiple Instructions, Multiple Data
Multiple Instructions, Multiple Data

Note: if you are writing a parallel program you should be aware of the processor category.

Process

The process is a live instance of your code that is running in memory. in OOP, a class is a dead object when you will create an instance it is going to be live. imagine that your code is a class and process is an instance of the class.

Threads

The thread is a lightweight process that is part of a process that can be created by it, each process can create many threads, resources that allocated to parent process is shared between threads if the number of threads is more than a threshold, the efficiency of the program falls down and CPU can not process them only involve to creating and switch between them.

Topics that you ought to know about threads:

Thread Safe

A single thread is safe, but when you create more than one thread, if they want to communicate with each other those need a Shared Memory. I will talk in the inter-process communication section about it. when more than one process proceeds to access shared memory shouldn’t happen race condition, and all threads should be safe regarding race condition.

Native Thread

All threads that are creating in by processes by default are Native Threads because it uses capabilities of OS. there is a way that you can still create a thread when no supporting of OS, in the next section I’ve explained it.

Green Thread

For using threads your OS should support it. if you want to use threads without supporting OS, there is a way that the host language that you are writing code can create and handle threads. those threads call Green Thread.

Note: Green threads are slower than native threads

Fiber Thread

A fiber thread is kinda Gread thread created by the user and can yield it, stop and resume. it doesn’t block code and execute in the background. you can invoke more than one Fiber at a time, and they will run concurrently.

Coroutine

A coroutine is kinda Fiber but it doesn't resemble a thread, consider the same as functions, when invoking it, controller of the program will transfer to coroutine until it finished or yield.

Parallel vs Concurrency

The process can be created manually or dynamically that we are going to talk about it rest of the article. suppose that you run two instances of your code, these processes are running at the same time but not actually parallel or concurrency, it returns to CPU that how behaves with processes. CPU uses some algorithms to pick a process and run in a specific time called Quantum when the quantum is going to be finished CPU switched to another process.

Let’s look at an example of understanding the differences between parallel and concurrency.

When you are reading this article it might that you are listening to music at the same time (parallel), suddenly you’ve got a message on your phone you have to move your head and check the phone, in this case, you can not read the article and check your phone at the same time, this call concurrency. when processes (or threads) are running concurrently because the speed of processor it switches fast between them, it seems they are running at the same time

Your head has two-part one of them is your eyes and the other is ears, because of this you could read and listen at the same time, so we need more than one processor or one processor with some cores for supporting parallel.

Memory organization

Access to memory!

Processes for running need data, this data stored in memory, suppose that there is only one process is running, it can be access to memory without worried about memory is accessible. but when more than one process is running by processors, base on processors algorithms each process can be run when its turn. if those are running concurrently you shouldn’t worried about access them to memory, you should make some shared memory that all of them can access it without race condition issue. the problem will come when more than one process is running and we should aware of race condition(access processes to shared memory) and make local memory for each process called distributed memory.

Note: be aware that memory speed should be compatible with CPU speed because it is slower than CPU, just think about it…

Shared Memory

An area that is shared between processes than can access and change it(Inter-process communication), the race condition can be raised in this case.

There are a lot of ways that processes can pass data to each other through shared memory, keep reading we are going to talking about it.

Distributed Memory

local memory that belongs to the respective process.

Inter-Process Communication

Processes and Threads need to communicate with each other, it can be passing data, pause others, lock others, on the other hand, manage others.

Processes communicate with each other through shared memory, and because of they are running at the same time it can be carried the inconsistency in data and race condition, shared memory should be protected when a process is using it, there are some methods that can handle this issue, these methods can be used:

  • Lock
    This method is used when two processes are going to access to shared memory. when p1 is accessing shared memory, changed the lock to true, in this time if p2 needs shared memory because of lock is true is rejected and wait in a loop(spinlock) until lock change to false by p1.
    it can put processes in deadlock.
set lock to falsecreate thread1
create thread2
allocate shared memorydefine x in shared memory and assign 0 to itthread1 is going to access x in shared memory
if lock == false
set lock to true
else
wait until lock switched to false
thread2 is going to access x in shared memory
if lock == false
set lock to true
else
wait until lock switched to false
  • Semaphores
    The lock is a boolean variable that can have two values. if change it to number(or any data structured that can have more than two values) it reveals semaphore. semaphore managed by the operating system at a low level, it divided two sections, semaphore variable initializes by 0 and semaphore queue managed by the operating system.
    Semaphore abstract object {qeueu:[], value: 0, acquire:Function, release: Function}
    Consider n processes is going to access to shared memory, p1 first send the instruction to access it, it invokes acquire. the acquire is a method that first decrease value and give shared memory to process if the value is non-negative. p1 has to invoke release. the release is a method that processes invokes it when it has finished and increment value and executes the first process in the queue. acquire and release should be run atomically, it means CPU can not switch to other processes due to executing this.
semaphore = aks host language to give you Semaphore object
create thread1
create thread2
create thread3
create thread4
allocate shared memorydefine x in shared memory and assign 0 to itthread1 is going to access x in shared memory
invoke acquire()
once has proceeded of manipulating data in shared memory
invoke release()
thread2 is going to access x in shared memory
invoke acquire()
once has proceeded of manipulating data in shared memory
invoke release()
...
  • Queue
    This method uses when the number of processes is more than two. each process should store their correlation id to queue and when it reaches the front of the queue be able to access shared memory. due to running a process other processes in the queue will suspend until it has been finished.
  • Socket
    This is a platform for transmitting data over the network. Consider process hosted on a different computer or even the same computer and want to send data to each other, they can find each other through the network over the socket.

Overview

We explored fundamental topics in parallelism and concurrency, threads and different types of threads, shared memory and accessing it and how we can protect it to prevent the race condition. inter-process communication can have more methods like message passing. let me talk about them in the next parts, I will talk about Parallel programming models and Concurrency design patterns especially in Golang part.

Next Part

In the next part, I’m going to look at parallelism! and concurrency in Node.js. probably most of you heard that Node.js is the single thread, you are right but not much, I’m going to dive in Event Loop, Cluster, Child_Process, Worker_Thread, and Pool in Node.js

Next Next … Part

This article has five sections:

  1. Fundamental terms in parallelism — part1
  2. Parallelism(and concurrency) in Node.js — part2
  3. Parallelism(and concurrency) in Python — part3
  4. Parallelism(and concurrency) in Golang —part 4
  5. Comparison of Parallelism(and concurrency) in Node.js, Python, and Golang — part5

--

--