The unparalleled design of Eclipse Collections

Donald Raab
Javarevisited
Published in
16 min readOct 31, 2021

Exploring the evolutionary design of parallelism in Eclipse Collections

Photo by Jason Yuen on Unsplash

Plenty of data, memory, and cores

In 2006, I was working on a proprietary distributed Java caching architecture in a large financial services company. We cached and indexed all of the firm’s trades and positions in a complex object graph in memory so we could slice and dice through it at high speed. By this time, I had already built and was extensively leveraging the serial eager API of an internal library named Caramel. This library would eventually become what we now know as Eclipse Collections.

In the area I was working in 2006, we had many caches that were tens of gigabytes in size, networked together. There were many large collections (millions of objects) that we would iterate through in memory to perform various pipelined calculations which most often involved grouping, netting and aggregating various balances.

Around this time, I would discover the Fork/Join framework from Doug Lea in the original EDU.oswego concurrent utility library. The Fork/Join framework was initially left out of java.util.concurrent when Java 5 was released. It would later be included in Java 7. I learned about the Fork/Join framework reading about it in “Concurrent Programming in Java” (CPJ) and this paper on “A Java Fork/Join Framework” by Doug Lea.

I went looking for my copy of CPJ on my bookshelf as I was writing this blog and when I couldn’t find it, I bought myself a brand new copy. This book is a must read for all aspiring senior Java developers.

Eager Serial, Parallel and Fork/Join

The initial methods in the Caramel library were all serial and eager. I occasionally would add fused methods to combine a couple operations together to increase performance. For example, the method collectIf is a combination of select (inclusive filter) and collect (map or transform). It would be a few years before we would add lazy methods to the API in Eclipse Collections.

For more information and examples of fused methods, please check out the following blog.

The initial implementation of the parallel API in Caramel was eager. There is a utility class that is still available in Eclipse Collections today that provides this eager parallel functionality. The class is named ParallelIterate. The class initially used the Fork/Join framework from the EDU.oswego concurrent library. It would later be converted to use Executor after the java.util.concurrent package was introduced in Java 5 without the Fork/Join framework. When the Fork/Join framework was added in Java 7, a new utility class named FJIterate was added to GS Collections. FJIterate is included in its own module in Eclipse Collections and is distributed in a separate jar file. FJIterate has existed since mid-2013, which was two years after Java 7 was released (July 2011). It will require an extra Maven dependency if you want to use it.

<dependency>
<groupId>org.eclipse.collections</groupId>
<artifactId>eclipse-collections-forkjoin</artifactId>
<version>${eclipse-collections.version}</version>
</dependency>

The methods available on ParallelIterate and FJIterate are almost the same. The implementations are fairly similar, with the primary difference being that ParallelIterate uses Executor and and FJIterate uses the Fork/Join framework. Using Executor makes ParallelIterate more suitable for some tasks. For raw, in-memory compute performance, FJIterate is sometimes the better option.

Both ParallelIterate and FJIterate were built for the same reason. We wanted raw parallel performance for large in memory data sets that were backed by arrays. Both classes will parallelize execution for a fixed set of algorithms for any Iterable type. The primary workhorse for both ParallelIterate and FJIterate is a parallel forEach. All of the other parallel algorithms are implemented using parallel forEach. There are twelve overloaded forms of forEach on ParallelIterate and FJIterate. The overloads take different parameters to give as much control as possible to developers. The design rationale for this is simple. We believed that if someone was able to prove that they would benefit from parallelism, then they would be in the best position to decide how to tune various parameters to squeeze as much performance as possible out of the hardware for parallel use cases they had.

ParallelIterate

Here are the methods available on ParallelIterate.

FJIterate

Here are the methods available on FJIterate

ParallelIterate vs. FJIterate

The symmetric difference and intersection of the methods on the two utility classes are as follows.

Symmetric Difference and Intersection of ParallelIterate and FJIterate methods

The biggest difference is that several sumBy methods were added to ParallelIterate but not ported over to FJIterate.

The Futility of Utility

The downside of utility classes like ParallelIterate and FJIterate is that methods can only return a single type. You only get one choice, so if you want to return different implementations from a method based on the input parameter type, you have to choose a common parent type. Methods on ParallelIterate and FJIterate take any java.lang.Iterable as an input parameter, and methods that return collections (e.g. select, reject, collect) have to unfortunately return java.util.Collection. Developers can control the return type by using overloaded methods with the same name which take a target collection as a parameter. If the Iterable used as the source parameter implements the BatchIterable interface, it will be optimized for parallelism for both ParallelIterate and FJIterate. If the source does not implement BatchIterable, or does not implement java.util.List, both utility classes will default to copying the elements to an array before parallelizing.

Here are some examples of ParallelIterate using the basic form of select (inclusive filter) and the overloaded form of select that takes a target collection.

The behavior of ParallelIterate select method is to return a type similar to the Iterable type that is passed in. For a List, a List implementation will be returned. For a Set, a Set implementation is returned. Unfortunately, since there can only be one signature for this method, the return type has to be the most abstract type which is Collection. Collection as an interface is not terribly useful. If I ever get around to refactoring the utility, I will return MutableCollection or RichIterable instead of Collection. This will make the utility methods a lot more useful, and maybe just slightly less futile.

Lazy asParallel

We took a different approach when it came to designing and implementing the lazy parallel API in Eclipse Collections. We decided we would require developers to provide an Executor and a batch size instead of offering up multiple combinations of knobs and switches to configure via overloads as we did with our parallel utility classes. Based on our experience with supporting eager parallelism for seven years, these two parameters seemed to be the most important configuration options that developers wanted control over. This makes the lazy parallel API in Eclipse Collections slightly harder to use than parallelStream in the JDK. This is by design. It should be harder for a developer to write parallel code, because they first need to determine if using a parallel algorithm will speed up or slow down an operation. If a developer understands exactly what they are doing, because they ran benchmarks and have proven parallelism will indeed help the performance of a specific use case, then they will be in the best position to determine how to configure the parallelism for optimal performance.

The other more important difference between the eager and lazy parallel API, is that the algorithms are available through data structures themselves for lazy, instead of being located in a utility class like ParallelIterate.

Here’s an example that takes a million integers and filters all of the prime values into a Bag. I show a fluent approach first, and then break the fluent calls into their intermediate results to show the intermediate return types.

Notice there is a very specific type named ParallelListIterable that is returned from asParallel. This type is lazy, so no real computation occurs until a terminal operation is called. The same type is returned after calling select. The method toBag is a terminal operation and results in a MutableBag being created. Now let’s look at what happens to our types if the initial collection is a MutableSet instead of a MutableList.

Notice the return type for asParallel for a MutableSet is ParallelUnsortedSetIterable.

ParallelListIterable vs. ParallelIterate

If we compare the methods available on ParallelListIterable with the methods available on ParallelIterate, it will become evident that a lot more investment has been made in growing the parallel lazy API in Eclipse Collections. The following shows the symmetric difference and intersection of methods available between both.

Symmetric Difference and Intersection of ParallelListIterable and ParallelIterate methods

JDK stream vs. parallelStream

Have you ever noticed the return type for stream and parallelStream in the JDK is the same type? They both return Stream. You might think that perhaps the implementations that are returned for the methods are different classes implementing the same interface, but they are not. Both stream and parallelStream return a new instance of ReferencePipeline.Head. The difference between them is a boolean parameter named parallel. What this means is that the serial and parallel code paths are mixed together, and usually split on a boolean expression involving a call to a method named isParallel where parallelism might choose a different path. I searched for usages of isParallel in AbstractPipeline and found there are 48 usages in the parent class and four subclasses (ReferencePipeline, IntPipeline, LongPipeline, DoublePipeline) .

The upside here is that the serial lazy and parallel lazy API in the JDK with streams is identical. Having a single implementation class for both serial and parallel guarantees this as they share the exact same code paths. The downside is that the code paths are hard to understand just by reading the code and very difficult to trace, even with the help of a debugger.

Eclipse Collections asLazy vs. asParallel

We’ve already seen that the return types for asParallel are covariant for the types the method is defined on. The return type will always be a subtype of ParallelIterable. ParallelIterable has no direct relation to RichIterable. The method asLazy, which is defined on RichIterable returns a LazyIterable. LazyIterable extends RichIterable.

The following class diagram shows the inheritance relationships between RichIterable, LazyIterable and ParallelIterable.

RichIterable, LazyIterable and ParallelIterable Interfaces

RichIterable is the parent type for LazyIterable and all of the container types in Eclipse Collections (e.g. MutableList/ImmutableList, MutableSet/ImmutableSet, etc.). LazyIterable provides serial lazy algorithm implementations.

ParallelIterable is the parent type for all corresponding parallel lazy adapters. There is a distinct split between serial and parallel API in Eclipse Collections. This means there is asymmetry between LazyIterable and ParallelIterable. This allows us to limit the parallel API to only those algorithms where there would be a reasonable performance benefit if parallelized. This also allows the serial implementations to be as simple as possible, and the parallel implementations can be optimized for specific types (e.g. Bag, List, Set).

LazyIterable vs. ParallelIterable

There are a lot more methods available on LazyIterable than on ParallelIterable. This can always change over time, if we determine that there is a need and a benefit to implementing a parallel version of a lazy serial method.

Symmetric Difference and Intersection of LazyIterable and ParallelIterable methods

Performance Benchmarks

I wrote some benchmarks a few years ago comparing filter, map, reduce and filter+map+reduce for a combination of serial, parallel, eager, lazy, object, and primitive types. The code and the results for the benchmarks were captured in the following blog. As you’ll see in the blog, I ran the benchmarks on JDK 8.

I decided when I started writing this blog, I wanted to write new benchmarks. I wanted to run the benchmarks on JDK 17 so I could see how the older eager parallel and fork/join utility classes held up with all of the optimizations that have arrived in the last nine versions of the JDK. I also wanted to make the benchmark code immediately available in open source for developers to experiment with on their own, and arrive at their own conclusions on their own hardware. The benchmarks are part of the JMH Kata module in the BNYM Code Katas repo. This time I focused on a use case for filter+map. There is a fused method for filter+map on the ParallelIterate and FJIterate utility classes named collectIf. This method is also available on the serial API for

The JMH Kata is what I refer to as a “sandbox kata”. You can use it as a sandbox to run your own experiments and try out your own benchmarks. It’s set up to run a few starter JMH benchmarks, and saves you the time of setting up a project to do the same.

Hardware

I used my MacPro “trash can” with the following hardware specs to measure the benchmarks:

Processor Name: 12-Core Intel Xeon E5
Processor Speed: 2.7 GHz
Number of Processors: 1
Total Number of Cores: 12
L2 Cache (per Core): 256 KB
L3 Cache: 30 MB
Memory: 64 GB

Software and Benchmark Configuration

I used OpenJDK 17 with Java Streams and Eclipse Collections 11.0.0.M2. I used JMH version 1.33 as the benchmark harness for the tests. I ran with 10 warmup iterations, and 10 measurement iterations with 2 forks. The warmup time and measurement time are both 5 seconds. I am using Mode.Throughput with the tests so they are easy to read. The numbers are in Operations per Second. The bigger the number, the better the result.

Data

I ran the benchmarks using a simple class named Person. Person has a name (String), age (int), heightInInches (double), weightInPounds (double). I ran the benchmarks for the the following data sizes.

  • 10,000 (filters and maps 4,995 values)
  • 100,000 (filters and maps 49,942 values)
  • 1,000,000 (filters and maps 499,615 values)
  • 8,675,309 (filters and maps 4,337,179 values)

The Charts

I sorted the columns in the chart from least to greatest, so it would be easy to find the slowest (far left) and fastest (far right) results. So be aware that the columns may be different for different data sizes.

Results — 10K People

Fastest Parallel: Eclipse Collections Eager Parallel (ParallelIterate)
Slowest Parallel: JDK Parallel Stream.toList()

Fastest Serial: Eclipse Collections Eager Serial
Slowest Serial: JDK Serial Stream Collectors.toList()

Filter / Map — Ops per second — 10,000 People

Results — 100K People

Fastest Parallel: Eclipse Collections Eager Parallel (ParallelIterate)
Slowest Parallel: JDK Parallel Stream Collectors.toList()

Fastest Serial: Eclipse Collections Eager Serial
Slowest Serial: JDK Serial Stream Collector.toList()

Filter / Map — Ops per second — 100,000 People

Results — 1 Million People

Fastest Parallel: Eclipse Collections Eager Fork/Join (FJIterate)
Slowest Parallel: Eclipse Collections Lazy asParallel()

Fastest Serial: Eclipse Collections Eager Serial
Slowest Serial: JDK Serial Stream Collector.toList()

Filter / Map — Ops per second — 1,000,000 People

Results — 8,675,309 People

Fastest Parallel: Eclipse Collections Eager Fork/Join (FJIterate)
Slowest Parallel: JDK Parallel Stream Collectors.toList()

Fastest Serial: Eclipse Collections Eager Serial
Slowest Serial: JDK Serial Stream Collectors.toList()

Filter / Map — Ops per second — 8,675,309 People

Results — JMH Output

Below is the raw consolidated JMH output used in the graphs above. There are also three mega sizes I tested with (25M, 50M, 100M) that I have not included graphs for. I had to switch from operations per second to milliseconds per operation on them so I didn’t want the graphs to be confusing. For the mega sizes, smaller numbers are better. The results with the mega sizes were consistent with Eclipse Collections Eager Fork/Join (FJIterate) being the fastest for parallel. Eclipse Collections Eager Serial was the fastest for the serial in all but the largest test, where JDK Serial Stream.toList() came out on top.

Some Lessons Learned

After more than 15 years of building parallel eager utility classes in Eclipse Collections, I’ve learned a few things. I had forgotten some of the lessons I learned along the way, but writing this blog has helped me re-discover some of them while pouring over the code. Writing efficient parallel algorithms is extremely hard work, and you will spend a lot of time running and re-running benchmarks. It is a rabbit hole, and you will lose days or weeks of your life if you fall into it.

You can sometimes tune performance for specific eager algorithms so that maybe you will get a 5%, 10% or maybe even 20% speedup over more general lazy algorithms. If performance is really important to you, then you may find implementing specific use cases with lower level frameworks like Fork/Join or Executors will be beneficial. Sometimes even hand coding an algorithm using a higher level construct like a parallel forEach with an efficient concurrent data structure will give good returns.

In 2013, buying a personal desktop machine with a decent number of cores and RAM that I could run parallel benchmarks against seemed like it would be a good long term investment for Eclipse Collections. In retrospect, I think it was a good investment, as I have used the machine to prepare benchmarks for various talks and blogs over the years. My plan has been to not even look at upgrading my personal desktop again until 10 years have passed. Surprisingly, even with all the promise of multiple cores showing up in laptops and desktop machines, it hasn’t been until relatively recently that we’ve seen a decent uptick in the number of cores and RAM for less than a totally outrageous prices, like I paid for my Mac Pro “trash can” in 2013.

Even though I have run a lot of benchmarks on the MacPro over the years, I haven’t actually done much tuning at all of any of the parallel algorithms in Eclipse Collections. I had previously tested Eclipse Collections with an extremely large machine at my previous employer (24 cores, 256GB RAM). We were already seeing good speedups for many of the parallel eager and lazy algorithms we implemented. As I mentioned above, our parallel lazy algorithms were implemented more recently than the parallel eager, but also haven’t really been tuned much since late 2014. Craig Motlin gave a great talk in June 2014 on the Eclipse Collections parallel lazy implementation approach. It has some great explanations and lessons on how three different implementations (Java 8, Scala, Eclipse Collections previously GS Collections) were tuned for specific parallel algorithms. I will link to it here for anyone who is looking to learn some great lessons about optimization strategies for parallel algorithms.

The Future

Now that JDK 17 is released, and there are new, cheaper machines with more cores available on the market, it might be worthwhile testing and tuning the parallel algorithms in Eclipse Collections again. It might also be useful to expand on the current parallel lazy implementation. Java Streams seem to be improving for some parallel use cases, and can probably still benefit from approaches that Eclipse Collections uses to optimize specific algorithms like filter and map. Craig describes the approach we use in his talk above, so it is definitely worthwhile watching. Often the future can benefit from lessons learned in the past.

I would like to refactor and clean up the parallel eager implementations in Eclipse Collections and improve the symmetry between ParallelIterate and FJIterate. The biggest change I would like to make will be to change the return type from Collection to either RichIterable or MutableCollection. in ParallelIterate and FJIterate.

I would also like to see some folks opportunistically pick up and continue the work on parallel lazy implementation of Eclipse Collections. There are a lot of methods that have not been added yet as I illustrated above showing the difference between LazyIterable and ParallelIterable. There is a cost and a benefit to improving symmetry. So far, the cost for adding more parallel methods has outweighed the benefits, which is why we haven’t done much more work in this space. But for the right person, who might looking to cut their teeth on parallel programming, and maybe test out all the cores in their newly purchased MacBook Pro with an M1 Max, the benefits of learning how to build optimized parallel algorithms might outweigh the costs.

I believe that parallel programming will become increasingly important for a larger population of Java developers. Learning how to program and tune parallel algorithms effectively will be something many developers will need to learn. The knowledge and experience from books like CPJ from Doug Lea and “Java Concurrency in Practice” (JCIP) from Brian Goetz will become important and popular again. Now that I have my second brand new copy of CPJ, and my previously signed copy of JCIP, I am ready to re-learn the lessons of concurrency and parallelism all over again.

Final Thoughts

My goal for this blog was to share some lessons I learned from the past 15 years that otherwise might have gone undiscovered or completely forgotten. I doubt most developers who use Eclipse Collections will have dug into the parallel algorithms available in the library before reading this blog. I hope some Java developers read this blog and find it useful for helping them learn more about parallel programming approaches they may not have been previously aware of. If you read it and liked it, you can let me know by clapping and/or commenting. I generally dislike including micro-benchmarks in blogs, but I think folks find them interesting enough to start investigating and learning more. Take my benchmarks with a huge grain of salt and two pounds of skepticism. I wouldn’t recommend basing any decisions on them. I highly recommend writing your own application benchmarks for your own specific use cases and determining for yourself whether a particular approach will help you achieve better or worse performance. As I’ve recommended in my previous blog linked above:

Prove it before going Parallel.

Thanks for reading!

I am a Project Lead and Committer for the Eclipse Collections OSS project at the Eclipse Foundation. Eclipse Collections is open for contributions. If you like the library, you can let us know by starring it on GitHub.

--

--

Donald Raab
Javarevisited

Java Champion. Creator of the Eclipse Collections OSS Java library (https://github.com/eclipse/eclipse-collections). Inspired by Smalltalk. Opinions are my own.