In this post, we will see about Parallel Stream in java.
Java 8 introduces the concept of the parallel stream to do parallel processing. As we have a number of CPU cores nowadays due to cheap hardware costs, parallel processing can be used to perform operation faster.
Let’s understand with the help of a simple example
When you run the above program, you will get below output
Using Sequential Stream
Using Parallel Stream
If you notice the output, the main thread is doing all the work in case of the sequential stream. It waits for current iteration to complete and then work on the next iteration.
In the case of Parallel stream,4 threads are spawned simultaneously and it internally using Fork and Join pool to create and manage threads.Parallel streams create
ForkJoinPool instance via static
Parallel Stream takes benefits of all available CPU cores and processes the tasks in parallel. If the number of tasks exceeds the number of cores, then remaining tasks wait for currently running task to complete.
Parallel Streams are cool, so should you use it always?
A big No!!
It is easy to convert sequential Stream to parallel Stream just by adding .parallel, does not mean you should always use it.
There are lots of factors you need to consider while using parallel streams otherwise you will suffer from negative impacts of parallel Streams.
Parallel Stream has much higher overhead than sequential Stream and it takes a good amount of time to coordinate between threads.
You need to consider parallel Stream if and only if:
- You have a large dataset to process.
- As you know that Java uses ForkJoinPool to achieve parallelism, ForkJoinPool forks sources stream and submit for execution, so your source stream should be splittable.
ArrayList is very easy to split, as we can find a middle element by its index and split it but LinkedList is very hard to split and does not perform very well in most of the cases.
- You are actually suffering from performance issues.
- You need to make sure that all the shared resources between threads need to be synchronized properly otherwise it might produce unexpected results.
The simplest formula for measuring parallelism is “NQ” model as provided by Brian Goetz in his presentation.
N x Q >10000
N = number of items in the dataset
Q = amount of work per item
It means if you have a large number of datasets and less work per item(For example: Sum), parallelism might help you run program faster and vice versa is also true. So if you have less number of datasets and more work per item(doing some computational work), then also parallelism might help you in achieving results faster.
Let’s see with the help of another example.
In this example, we are going to see how CPU behaves when you perform long computations in case of parallel Stream and sequential stream. We are doing some arbit calculations to make the CPU busy.
When you run the above program, you will get below output.
Time taken to complete:6 minutes
But we are not interested in output here, but how CPU behaved when the above operation performed.
As you can see CPU is not fully utilized in case of Sequential Stream.
Let’s change at 16 lines no. and make the stream parallel and run the program again.
You will get below output when you run Stream in parallel.
Time taken to complete:3 minutes
Let’s check CPU history when we ran the program using a parallel stream.
As you can see parallel stream used all 4 CPU cores to perform computation.
That’s all about the parallel stream in java.
You may also like following Java 8 tutorials
Originally published at https://java2blog.com on April 23, 2019.