The 4am Jamestown-Scotland ferry and other optimization strategies

When performance is important, so is understanding your available options.

Shortcuts sometimes aren’t.

Happy New Year!

I thought I would start out 2018 with a performance optimization story from 2017.

Takeaways from this blog

  • Java Iteration Pattern Optimization Strategies
  • A few Eclipse Collections and Java Stream Iteration Pattern options.
  • Recommendations at the end

A shortcut with a twist

On January 2nd 2017, I sat with my family in our Honda Pilot on a pier at 3:30am for a half hour waiting for the 4am Jamestown-Scotland ferry to arrive. I had come to the literal end of the road on a shortcut that wasn’t exactly as I had expected. I decided to take the shorter distance route on my car’s Nav System to avoid having to go north on Interstate 95 only then to have to go south to get to Williamsburg, Virginia. I’ve gotten stuck in bumper to bumper traffic in Virginia on Route 95 on late night rides coming back from Florida a few times before. When we got to the end of the road on our shorter route, the Nav System indicated the next turn was to get on the ferry (see picture above).

I was willing to take slower local roads, especially since it was early in the morning and there would be no traffic on them. We discovered too late that the path that our chosen path included a ferry ride. At this point, we only had two options. We could wait for the ferry and hope it was running, or turn around and add another 3 to 4 hours to our trip. A classic Hobson’s Choice. We waited for the ferry. It turned out to be a fun experience once we parked our car on the ferry, but I would have preferred an alternative at 4am after driving 14 hours.

“Two roads diverged in a wood…” — Robert Frost

I certainly took the one less traveled by. I did learn a new route that I didn’t know before for getting to Williamsburg from Orlando, as well as the planning required to optimize that route with the ferry schedule.

What does this trip have to do with Eclipse Collections, you may ask? Well, the path I took was the Serial (one lane Colonial era roads), Lazy (ferry does the work until you get to the dock), and Boxed (your car is literally boxed in on the ferry by other cars) — just one of many options you can choose with Eclipse Collections and Java Streams.

“Premature optimization is the root of all evil” — Donald Knuth

Readability should be prioritized above performance when writing code. However, it helps to know what your available performance optimization options are, before you discover last minute your only option is stopping and waiting for the next ferry. You may actually be able to achieve better performance without sacrificing readability. In fact, there may be options you were unaware of previously that improve both readability and performance.

There is a set of Iteration Pattern Optimization Strategies that I believe all developers should become aware of so they can appropriately tune their code for the best performance.

Don’t guess when optimizing code. First prove you have a problem that needs to be fixed. Then benchmark any solutions you think may help to prove that they actually do.

Travelers Beware: You can lose many hours of your life measuring performance optimization benefits. The tests I have run below take 45–50 minutes to run each time. I had to run them several times along with unit tests to validate that the results were the same across all similar tests. When you see the charts, you may be at first compelled by the graphs in terms of wanting to change your code to be more “optimal”. Optimal may not equate to noticeably faster in terms of your application’s overall performance. Each of these tests take at most hundreds of milliseconds to run. They are all “fast”, because they are all in memory. The optimal solutions may only accumulate savings over a large number of executions. If you happen to see a more readable solution you were not aware of here, go for that one.

Iteration Pattern Optimization Strategies

Do you know how to leverage all of these strategies separately and together to increase performance without sacrificing readability?

  • Eager — executes immediately with potential optimizations specific to each algorithm and data structure. Eager algorithms are as close to a hand coded for-loop as you will get, so they are easy to understand and debug. I prefer eager as the default option for iterating over collections. It is the simplest and usually most succinct and readable solution available. I consider every other solution a potential optimization, which may prove pre-mature.
  • Primitive — If you can avoid boxing primitives, you can reduce memory cost and potentially increase performance. I always use primitive collections and algorithms when I can.
  • Lazy — executes only when a terminal operation is called. Optimizations include reducing the amount of memory required and total computation when multiple operation are executed. Short-circuiting effects can really help performance when run lazily. I prefer lazy as soon as I am executing multiple operations that would result in temporary collections being created.
  • Parallel — It costs more to run in parallel. You need the right data size, algorithm and multiple cores. If you have all of these, you may benefit from running in parallel. Measure it to prove it.

Eager vs. Lazy — Understanding how they work

Let’s take a list of five integers and perform a filter, map, and reduce set of operations both eagerly and lazily.

@Test
public void eagerVsLazy()
{
long eagerSum = Lists.mutable.with(1, 2, 3, 4, 5)
.tap(i -> System.out.println("eager select: " + i))
.select(i -> i % 2 == 0)
.tap(i -> System.out.println("eager collect: " + i))
.collectInt(i -> i * 2)
.tap(i -> System.out.println("eager sum: " + i))
.sum();
System.out.println(eagerSum);

long lazySum = Lists.mutable.with(1, 2, 3, 4, 5)
.asLazy()
.tap(i -> System.out.println("lazy select: " + i))
.select(i -> i % 2 == 0)
.tap(i -> System.out.println("lazy collect: " + i))
.collectInt(i -> i * 2)
.tap(i -> System.out.println("lazy sum: " + i))
.sum();
System.out.println(lazySum);

Assert.assertEquals(eagerSum, lazySum);
}

Except for the additional call to asLazy in the lazy example, the code should look identical. The printed results are as follows:

eager select: 1
eager select: 2
eager select: 3
eager select: 4
eager select: 5
eager collect: 2
eager collect: 4
eager sum: 4
eager sum: 8
12
lazy select: 1
lazy select: 2
lazy collect: 2
lazy sum: 4
lazy select: 3
lazy select: 4
lazy collect: 4
lazy sum: 8
lazy select: 5
12

Notice how the order of execution changes on the lambdas in the lazy case. In the eager case, two additional lists are created as intermediate results during the execution. A List of Integer with two Integers (2, 4) and then an IntList with two ints (4, 8) are created before the final call to sum. In the lazy case, there are no intermediate collections created. This results in less garbage being generated. This is why I prefer lazy execution when there are multiple operations involved. If there was a single operation involved, then I would default to using the eager solution.

If we look at the serial Stream solution, it’s execution order will be the same as the lazy Eclipse Collections solution.

@Test
public void stream()
{
int streamSum = Lists.mutable.with(1, 2, 3, 4, 5)
.stream()
.peek(i -> System.out.println("stream filter: "+ i))
.filter(i -> i % 2 == 0)
.peek(i -> System.out.println("stream map: "+ i))
.mapToInt(i -> i * 2)
.peek(i -> System.out.println("stream sum: "+ i))
.sum();
System.out.println(streamSum);
}

Here is the output:

stream filter: 1
stream filter: 2
stream map: 2
stream sum: 4
stream filter: 3
stream filter: 4
stream map: 4
stream sum: 8
stream filter: 5
12

Lazy + Parallel = Harder to Follow

Using Eclipse Collections lazy parallel with a batch size of one so we can see the results for a very small list.

@Test
public void parallel()
{
long parallelSum = Lists.mutable.with(1, 2, 3, 4, 5)
.asParallel(Executors.newWorkStealingPool(), 1)
.select(i -> {
System.out.println("parallel select: " + i);
return i % 2 == 0;
})
.collect(i -> {
System.out.println("parallel collect: " + i);
return i * 2;
})
.sumOfInt(i -> {
System.out.println("parallel sum: " + i);
return i;
});
System.out.println(parallelSum);
}
Run 1:
parallel select: 2
parallel select: 1
parallel select: 4
parallel collect: 4
parallel select: 3
sum: 8
parallel select: 5
parallel collect: 2
sum: 4
12
Run 2:
parallel select: 1
parallel select: 3
parallel select: 2
parallel select: 5
parallel select: 4
parallel collect: 2
parallel collect: 4
parallel sum: 4
parallel sum: 8
12
Run 3:
parallel select: 4
parallel select: 2
parallel collect: 2
parallel select: 5
parallel select: 3
parallel select: 1
parallel sum: 4
parallel collect: 4
parallel sum: 8
12

The result is consistent between runs, but the order of execution of lambdas is not guaranteed nor consistent.

Using parallel Streams:

@Test
public void parallelStream()
{
int streamSum = Interval.oneTo(5).toList()
.parallelStream()
.peek(i -> System.out.println("stream filter: "+ i))
.filter(i -> i % 2 == 0)
.peek(i -> System.out.println("stream map: "+ i))
.mapToInt(i -> i * 2)
.peek(i -> System.out.println("stream sum: "+ i))
.sum();
System.out.println(streamSum);
}
Run 1:
stream filter: 4
stream filter: 1
stream map: 4
stream filter: 2
stream sum: 8
stream filter: 3
stream filter: 5
stream map: 2
stream sum: 4
12
Run 2:
stream filter: 5
stream filter: 1
stream filter: 3
stream filter: 2
stream filter: 4
stream map: 2
stream map: 4
stream sum: 4
stream sum: 8
12
Run 3:
stream filter: 2
stream filter: 4
stream map: 2
stream map: 4
stream sum: 8
stream filter: 1
stream filter: 3
stream filter: 5
stream sum: 4
12

Measure, Execute and Repeat.

I am going to show different options and their performance characteristics for a set of use cases using a million randomly generated integers stored in Lists. These are not likely to be the use cases you will encounter in production code, but they should hopefully illustrate some options you may not have been aware of next time you find a bottleneck you were not expecting in your basic Java data structures and algorithms. I will demonstrate the performance differences between using object and primitive lists, eager and lazy APIs, with both serial and parallel execution, with four different use cases.

In each use case, I share what I observed — expected and unexpected. I only observed. I have not dug into the why the results were what they were. “The why” perhaps is a topic for another blog.

Use Cases — Filter, Map, Reduce, and Filter/Map/Reduce

1. Filter even integers into a List
2. Multiply the integers by 2 and storing the result in a List
3. Sum all the integers into a long
4. Filter/Map/Reduce (Filter Evens, Multiply x 2, Sum into long)

The Data — 1,000,000 Integers

private List<Integer> jdkList;
private MutableList<Integer> ecList;
private IntList ecPrimitiveList;
private ExecutorService executorService;
@Setup
public void setup()
{
PrimitiveIterator.OfInt intGenerator =
new Random(1L).ints(-1000, 1000).iterator();
this.ecList =
FastList.newWithNValues(1_000_000, intGenerator::nextInt);
this.jdkList = new ArrayList<>(1_000_000);
this.jdkList.addAll(this.ecList);
this.ecPrimitiveList =
this.ecList.collectInt(i -> i, new IntArrayList(1_000_000));
this.executorService = Executors.newWorkStealingPool();
}

Hardware

I will be using a MacPro with the following hardware specs to measure the benchmarks:

Processor Name: 12-Core Intel Xeon E5
Processor Speed: 2.7 GHz
Number of Processors: 1
Total Number of Cores: 12
L2 Cache (per Core): 256 KB
L3 Cache: 30 MB
Memory: 64 GB

Software

To illustrate the different options that are available for these particular use cases, I will be using JDK 1.8.0_152 with Eclipse Collections and Streams.

Benchmarking

I am using JMH version 1.19 as the benchmark harness for my tests. I am running 30 warmup iterations, and 20 measurement iterations with 2 forks. I am using Mode.Throughput with the tests so they are easy to read. The numbers are in Operations per Second. The bigger the number, the better the result.

public static void main(String[] args) throws RunnerException
{
Options options = new OptionsBuilder()
.include(".*" + IntListJMHTest.class.getSimpleName() + ".*")
.forks(2)
.mode(Mode.Throughput)
.timeUnit(TimeUnit.SECONDS)
.warmupIterations(30)
.build();
new Runner(options).run();
}

I will highlight in dark green the best overall result in the run. I will highlight in light green the best serial execution result. Where I use EC in a label in the chart it stands for a solution using Eclipse Collections. Where I used JDK, the solution uses a standard JDK approach.

Filter even integers

Filter even numbers from a List of 1,000,000 Integers

Expected:

  • I expected ECParallelEager to perform better.
  • I expected primitive collections to outperform boxed collections.
  • I expected serial eager to outperform serial lazy.

Unexpected:

  • I did not expect parallel streams to perform this poorly.
@Benchmark
public MutableList<Integer> filterECBoxedEager()
{
return this.ecList.select(i -> i % 2 == 0);
}
@Benchmark
public MutableList<Integer> filterECBoxedLazy()
{
return this.ecList
.asLazy()
.select(i -> i % 2 == 0)
.toList();
}
@Benchmark
public MutableList<Integer> filterECParallelEager()
{
return ParallelIterate.select(
this.ecList,
i -> i % 2 == 0,
new CompositeFastList<>(),
false);
}
@Benchmark
public MutableList<Integer> filterECParallelLazy()
{
return this.ecList
.asParallel(this.executorService, 50_000)
.select(i -> i % 2 == 0)
.toList();
}
@Benchmark
public IntList filterECPrimitiveEager()
{
return this.ecPrimitiveList.select(i -> i % 2 == 0);
}
@Benchmark
public IntList filterECPrimitiveLazy()
{
return this.ecPrimitiveList
.asLazy()
.select(i -> i % 2 == 0)
.toList();
}
@Benchmark
public List<Integer> filterJDKBoxedParallelStream()
{
return this.jdkList
.parallelStream()
.filter(i -> i % 2 == 0)
.collect(Collectors.toList());
}
@Benchmark
public List<Integer> filterJDKBoxedStream()
{
return this.jdkList
.stream()
.filter(i -> i % 2 == 0)
.collect(Collectors.toList());
}

Map each integer x 2

Multiply times two, each integer in a List of 1,000,000 Integers

Expected:

  • I expected primitive collections to outperform boxed collections.
  • I expected serial eager to outperform serial lazy.

Unexpected:

  • I did not expected ECParallelLazy to perform so poorly.
  • I did not expect either Stream solutions to perform so poorly.
@Benchmark
public MutableList<Integer> mapECBoxedEager()
{
return this.ecList.collect(i -> i * 2);
}
@Benchmark
public MutableList<Integer> mapECBoxedLazy()
{
return this.ecList
.asLazy()
.collect(i -> i * 2)
.toList();
}
@Benchmark
public MutableList<Integer> mapECParallelEager()
{
return ParallelIterate.collect(
this.ecList, i -> i * 2,
new CompositeFastList<>(),
false);
}
@Benchmark
public MutableList<Integer> mapECParallelLazy()
{
return this.ecList
.asParallel(this.executorService, 50_000)
.collect(i -> i * 2)
.toList();
}
@Benchmark
public IntList mapECPrimitiveEager()
{
return this.ecPrimitiveList
.collectInt(i -> i * 2, IntLists.mutable.empty());
}
@Benchmark
public IntList mapECPrimitiveLazy()
{
return this.ecPrimitiveList
.asLazy()
.collectInt(i -> i * 2)
.toList();
}
@Benchmark
public List<Integer> mapJDKBoxedParallelStream()
{
return this.jdkList
.parallelStream()
.mapToInt(i -> i * 2)
.boxed()
.collect(Collectors.toList());
}
@Benchmark
public List<Integer> mapJDKBoxedStream()
{
return this.jdkList
.stream()
.mapToInt(i -> i * 2)
.boxed()
.collect(Collectors.toList());
}

Sum all integers

Sum 1,000,000 Integers

Expected:

  • I expected primitive collections to outperform boxed collections.
  • I expected little benefit from parallelization here. Summing ints is a very fast operation. I expected eager primitive to be faster than most of the parallel options.

Unexpected:

  • I did not expect serial streams to get crushed. There seems to have been an improvement made in Java 9. I ran the benchmarks again with Java 9 and this particular benchmark improved by ~7–8x.
@Benchmark
public long sumECBoxedEager()
{
return this.ecList.sumOfInt(Integer::intValue);
}
@Benchmark
public long sumECBoxedLazy()
{
return this.ecList
.asLazy()
.sumOfInt(Integer::intValue);
}
@Benchmark
public long sumECParallelEager()
{
return ParallelIterate.sumByInt(
this.ecList,
i -> Integer.valueOf(0),
Integer::intValue).get(0);
}
@Benchmark
public long sumECParallelLazy()
{
return this.ecList
.asParallel(this.executorService, 50_000)
.sumOfInt(Integer::intValue);
}
@Benchmark
public long sumECPrimitiveEager()
{
return this.ecPrimitiveList.sum();
}
@Benchmark
public long sumECPrimitiveLazy()
{
return this.ecPrimitiveList
.asLazy()
.sum();
}
@Benchmark
public long sumJDKBoxedParallelStream()
{
return this.jdkList
.parallelStream()
.mapToLong(Integer::longValue)
.sum();
}
@Benchmark
public long sumJDKBoxedStream()
{
return this.jdkList
.stream()
.mapToLong(Integer::longValue)
.sum();
}

Filter, Map, Sum

Filter even integers, multiply remaining x 2 and return their sum

Expected:

  • I expected lazy operations to outperform eager.
  • I expected primitive lazy would outperform all of the other serial operations.
  • I expected JDKBoxedParallelStream would perform well with this use case.

Unexpected:

  • I did not expect ECParallelEager to do as well as or better than ECParallelLazy, even though it was optimized.
  • I did not expect JDKBoxedParallelStream to do better than ECParallelLazy.
@Benchmark
public long filterMapSumECBoxedEager()
{
return this.ecList
.select(i -> i % 2 == 0)
.sumOfInt(i -> i * 2);
}
@Benchmark
public long filterMapSumECBoxedLazy()
{
return this.ecList
.asLazy()
.select(i -> i % 2 == 0)
.sumOfInt(i -> i * 2);
}
@Benchmark
public long filterMapSumECOptimizedParallelEager()
{
return ParallelIterate.sumByInt(
this.ecList,
i -> i % 2,
i -> i * 2).get(0);
}
@Benchmark
public long filterMapSumECOptimizedParallelLazy()
{
return this.ecList
.asParallel(this.executorService, 50_000)
.sumOfInt(i -> i % 2 == 0 ? i * 2 : 0);
}
@Benchmark
public long filterMapSumECParallelLazy()
{
return this.ecList
.asParallel(this.executorService, 50_000)
.select(i -> i % 2 == 0)
.sumOfInt(i -> i * 2);
}
@Benchmark
public long filterMapSumECPrimitiveEager()
{
return this.ecPrimitiveList
.select(i -> i % 2 == 0)
.collectInt(i -> i * 2, IntLists.mutable.empty())
.sum();
}
@Benchmark
public long filterMapSumECPrimitiveLazy()
{
return this.ecPrimitiveList
.asLazy()
.select(i -> i % 2 == 0)
.collectInt(i -> i * 2)
.sum();
}
@Benchmark
public long filterMapSumJDKBoxedParallelStream()
{
return this.jdkList
.parallelStream()
.filter(i -> i % 2 == 0)
.mapToLong(i -> (long) (i * 2))
.sum();
}
@Benchmark
public long filterMapSumJDKBoxedStream()
{
return this.jdkList
.stream()
.filter(i -> i % 2 == 0)
.mapToLong(i -> (long) (i * 2))
.sum();
}

Congratulations!

I hope you enjoyed the blog and learned some new things about Iteration Pattern Options and Optimization Strategies using Eclipse Collections and Java Streams. If your only tool is a hammer, everything else is a nail. Knowing your available options before you get started on your journey and adapting as needs arise is one of the keys to writing better and more responsive applications. This can also help you execute a less stressful trip from Orlando to Williamsburg, if ever that occasion happens to arise.

Recommendations

  • Prefer Primitives over Boxing.
  • Prefer Eager iteration for single or fused operations.
  • Prefer Lazy iteration for multi-step operations.
  • Prove it before going Parallel.
  • Try Eclipse Collections if you want more than Hobson’s Choice.

Eclipse Collections is open for contributions. If you like the library, you can let us know by starring it on GitHub.