Internals Of Java Parallel Streams

Why Parallel Streams May Not Be Always Faster

Published in

Geek Culture

4 min readJun 3, 2021

The Stream API was introduced in Java 8 as an efficient way to operate on collections. Parallel streams were introduced as a part of it for parallel processing and to make the application run faster.
Though parallel streams are supposed to increase your application performance by splitting the task between multiple threads and completing them faster than sequential execution, there are chances parallel streams can sometimes slow down the entire application.

Threads In Sequential And Parallel Execution

Consider an example where a stream is run on a list of numbers, which transforms the number by multiplying it by 10 and returns the result. The name of the thread being used is printed in transform().

Running the above program produces the following output:

Output of sequential and parallel execution

The sequential execution uses only the main thread whereas parallel execution uses both main thread and threads from ForkJoinPool.

Fork Join Pool

The fork/join framework, which was introduced in Java 7, helps speed up parallel processing by attempting to use all available processor cores, through a divide and conquer approach. Fork step splits the task into smaller subtasks and these tasks are executed concurrently, by different threads. After all the subtasks are executed, the Join step will combine all the results into one result. These steps adds a lot more overhead to parallel execution compared to sequential.

The fork/join framework uses a pool of threads managed by the ForkJoinPool. When tasks are run in parallel using Java parallel streams, it internally uses threads from the default thread pool of ForkJoinPool called the commonPool(), which is a static thread pool.

How many threads are available in the commonPool for parallel execution?

The number of threads in ForkJoinPool.commonPool() is one less than the number of logical CPU cores available in your machine.

Code to get the available processors on the machine and the amount of parallelism provided by ForkJoinPool

The number of logical CPU’s on my machine is 12, and the number of threads that can be executed in parallel by ForkJoinPool is 11. So in total 12 tasks can be executed in parallel here (main + ForkJoinPool), using all 12 CPU cores. Parallel stream is configured to use as many threads as the number of cores in the computer or VM on which the program is running.

To illustrate this, consider the above same program run with 15 numbers in the list, which is 15 tasks run in parallel. Since the number of tasks that can be executed in parallel is 12 on my machine, first 12 task will be executed first and remaining 3 will wait for another 3 running threads to complete execution.
In the output given below, threads marked as T1,T2, and T3 are used twice. Hence, if the number of tasks to be executed is more than the number of threads in commonPool, the remaining tasks will wait for the running tasks to complete.

Output of parallel execution with 15 numbers in list

Side-Effects Of Parallel Streams

Suppose a parallel stream is run on a cpu intensive/blocking task. The threads from the ForkJoinPool which is used for this operation will not be released back into the commonPool until this task is completed. So when this request is run on a server that handles thousands of requests per second, which runs many parallelStreams in parallel, a small number of requests will take up the entire threads in commonPool of ForkJoinPool and cause the rest of the requests to queue up, slowing down your entire application.

When using ForkJoinPool threads, the tasks should never be in a blocking state and should be over in a reasonable time. Hence parallel streams should always be used with caution for applications that needs to handle many requests simultaneously.