**Parallel Programming**

Parallel Programming is a paradigm of computing that helps in the faster processing of large and complex problems. It is a concept that is basically used in large, scalable systems where even a simple “for-loop” takes time to complete.

The concept of parallel programming can be visualized in a very simple way. Suppose a wall of 100*100 m needs to be built in 1 hour. A person putting 1 brick per min might not be able to finish the task in 1 hour but 100 people doing the same task parallelly with that person, would defiantly finish up the task within the stipulated time.

A similar analogy to the above example can be given with the delivery of letters. Suppose100 letters have to be given to 100 different people. Now, a person delivering all 100 letters would definitely take more time than 2 people delivering 50 letters each.

Diving to a more technical aspect, during computations where recursion is involved, processing each computation one by one can be tedious and can actually result in **time-out errors** in large cases. This mainly occurs as each recursion call basically takes its own time which keeps adding up.

Suppose we have to find the sum of all the elements of array. Normally we’ll do it with a single pass using one loop. But with very large-sized arrays even the complexity of O(N) would fail. Hence, we use parallel programming in such areas.

Before going any further, we need to make ourselves familiar with two basic terms that form the basics of parallel programming.

1. **Async**- It depicts the processes/tasks running in parallel with other in a program(in a synchronized style).

2.** Finish**- It depicts a segment that contains various async programs and it depicts the scope of ending of these async processes.

This figure describes the usage of the above terms. Here, S2 and S3 are running in parallel to each other while S1 and S4 are running sequentially with each other.

Taking a more “programmer-friendly” example, suppose we have to find the sum of a very large array (say a size of 100000000) or more. . Normally we’ll do it with a single pass using one loop. But with very large-sized arrays even the complexity of O(N) would fail. Hence, we can use the technique of divide and conquer to actually reduce the complexity below N. So for simple sum, using divide and conquer:-

The above is a simple JAVA program that uses Divide and Conquer to find the sum of array elements. Here, we create a class **SUM** and compute the sum via recursive calls of “compute” method. Now the calls for “l.compute” and “r.compute” can be replaced by an imaginary **Finish **block before **L.compute()** and use ASYNC with both l and r and end the **finish** block before adding computing sum variable.

In **java**, this ASYNC and finish block can be replaced by using **Fork and Join**, an API in java that helps us to do the same thing as we are doing above. So, instead of ASYNC, we can use **l.fork() **and **r.fork() **respectively, and to end the process we just need to do **l.join()** as our FINISH block starts with “l” and hence join symbolizes the scope of the FINISH block. (Actually, it shows the number of processes that are running in parallel with “l.compute”. **The more the number of statements b/w fork and join, the more the number of processes/statements running in parallel with one another).**

The above example is just a case and is a rough explanation of the core of the actual program. Yes, there can be many ways of doing the same divide and conquer method of finding the sum but the above example seems to be the easiest and hence explained here. Also, the sum constructor is used to initialize the passed values to the variables.

**Graphical Representation of Parallelism**

Parallelism, more than anything is more of a visual concept and hence graphs can help us to easily understand the process. Taking the example of our above picture (Fig 1.0)

Here there are 4 processes S1….S4, out of which S2 and S3 are running in parallel with each other. This might look confusing as in which processes are running in parallel with one another. So in order to simplify this confusion, we use graphical methods.

The above picture displays the graphical method of representing the above process. The edge between S1 and S2 is called the **FORK **edge, while the edge between S2 and S4 is called the **JOIN** edge. The edges of S1, S3, and S4 are just normal edges that represent a continuous flow of the program. Now if there exists a path of directed edges between two nodes, then they are not in parallel. Whereas if there is no edge between two processes, then they are said to be in parallel. Here, as we don’t have a path between S2 and S3, they are said to be in parallel.

**Characteristics of Representation Graph**

**1.** **SPAN: — **It’s the longest path from the start to the end process. (in the above example it can be either S1-S3-S4 or S1-S2-S4).

**2.** **WORK:** — The sum of the execution time of all the processes/nodes.

The above two metrics together are used to determine the parallelism in a program. The SPAN/WORK ratio of the program must be under a factor of 2 or so as anything above that would produce a margin that is much greater than the span of the program without parallelism. (which is generally, near about 1).

Max speed or the best performance that can be attained by the machine while doing a task is SpeedUp (P). Formulaically, it is defined as T1/TP <= Number of processors P and also <= the ideal parallelism, i.e. WORK/SPAN.

Where T1= time taken by 1 process and Tp is the time taken by all the processes.

Parallelism is a very broad and deep concept and I hope this article may help to get some rough idea if the same.

**Thanks for reading!**