Cross library Stream benchmarking : Playing scrabble with Shakespeare
Writing a fair / unbiased benchmark across different libraries is a tough challenge. Library authors may create benchmarks to help them decide how to optimize their libraries (along lines of operation that are important to them). When library authors publish benchmarks one of the major unspoken purposes is to promote their library. That is, the benchmark serves as a propaganda organ for the library (and — meta-alert — this article ultimately is no different).
David Karnok, lead developer of RxJava and the contributor of major enhancements to Pivotal’s Reactor library maintains a set of benchmarks of the relative performance of various streaming libraries . For each library he has implemented a version of José Paumard’s Shakepeare plays scrabble Stream based puzzle solver. The implementation is pretty involved but David has a good overview here : http://akarnokd.blogspot.ie/2016/12/the-reactive-scrabble-benchmarks.htm
In all fairness to David, he has created these benchmarks in response to RxJava being tested against Java 8 Streams a manner that may, or may not, have made sense for that library.
An apples-to-apples comparison
In David’s benchmark cyclops-react performs quite poorly (or more precisely ReactiveSeqImpl performs quite poorly), but most of the implementations of Shakespeare plays scrabble, across divergent libraries are radically different from each other (that is they use different operators, and at each point in the algorithm — some do more work, others less).
In the cyclops-react project we have included two sets of strictly equivalent implementations of this benchmark. The original Java 8 Streams version, and a direct conversion of that implementation to cyclops-react (replacing java.util.stream.Stream with ReactiveSeq) and also we have converted the cyclops-react implementation from David’s benchmarks directly to IxJava. For 2.0.0-MI2 of cyclops-react, 1.0.0-RC5 of IxJava along with 1.8.074 JDK we get the following the results.
In the graph below the results with the same colour code have exactly the same operators (or their direct equivalents).
Most of the very small gap between cyclops-react and Java 8 Stream can be explained by the fact the Shakespeare Plays Scrabble benchmark builds a large number of very small short-lived Streams. ReactiveSeq is marginally less efficient when operating in that manner due to the fact that data is copied to support replayable Streams. It’s not shown in the graph, but cyclops-react actually performs slightly better than Streams at the higher percentiles (and slightly worse at the lower ones)
cyclops-react and Java 8 Streams
When David found out about cyclops-react he added a benchmark for it’s ReactiveSeqImpl type which is a sequential synchronous extended JDK 8 Stream type that adds a lot of ‘reactive’ style operators found in libraries such as RxJava. The history of this particular implementation is that it is predated by SimpleReact’s LazyFutureStream which is an Stream of asychronously executed tasks. It was purposely created to provide a more performant option for synchronous CPU bound tasks within the library and was benchmarked against Java 8 Streams. In v1.x of cyclops-react the sequential ReactiveSeq implementation simply delegates to a Java 8 Stream instance for most it’s operations. I was certainly very surprised when David found that the cyclops Stream performed around 4 times worse than the implementation used for Java 8 Streams.
As ReactiveSeq is a Java 8 Stream, my first reaction was simply to replace references to Java 8 Stream with ReactiveSeq — in so doing the performance gap disappeared almost entirely (and unfortunately we did have some communication issues over this).
The benchmark *is* useful
In cyclops-react 2.x the Streaming engine has been rewritten entirely, Streams are now replayable, operators can be fused together and many micro-benchmarks show significant performance improvements over JDK 8 Streams. In other cases the additional overhead of efficiently copying information so that replaying Streams works properly causes the ReactiveSeq version to be marginally slower but still roughly equivalent.
When 2.0.0-MI1 was released, the Shakespeare Plays Scrabble benchmark gave a very clear warning there was a performance problem in efficient replayable Stream construction this has been fixed in 2.0.0-MI2.
Similarly we had a bug in our scanLeft implementation that was fixed in the 1.0.5 release and so have derived significant benefits from this benchmark.
But not useful in the *way* many might think
David does spell this out in his article on the benchmarks
I think that is not quite how most people who see those graphs with the libraries names next to them will interpret it.
Propaganda again — what does this graph say to you?
I think most people seeing this would ignore any caveats and decide cyclops-react is faster than IxJava. Which means that either
a. We need to be careful, cautious and aware of bias when publishing benchmarks (given competition between projects this is somewhat unlikely).
b. We all publish differing benchmarks with opposing stats and be damned (where we are today).
Inherent bias in benchmarks
Even when libraries are roughly equivalent a single macro-benchmark like this may not give an accurate reading as to which library will perform best for you, the user.
Why not? Some very salient reasons from my perspective include
- A single slow operator is included. In Streaming libraries a large number of operators are implemented independently, thus a library may be very fast but perform slowly on a macro benchmark because of one inefficient operator.
- Different levels of expertise of the benchmark author (this can mean the author chooses ultra-efficient operators from libraries they have expert level knowledge in and less efficient operators from others)
- Benchmark authors can choose what data to display and what to not show, even when no conscious choice has been made they are likely to be led by their interests and expertise
- Different levels of motivation and focus of both benchmark authors and library authors
Why ReactiveSeq is probably not 30% faster than IxJava
This last reason (in the list above), jumps out in particular, because I suspect it explains why cyclops-react scored ~30% performance advantage over IxJava in the benchmark result shown above.
IxJava’s author, David Karnok was probably completely unaware I was going to run this benchmark. I, on the other hand was using this exact benchmark to squeeze further optimizations out of the new Streaming engine powering the sequential ReactiveSeq implementation in cyclops-react 2.x (in particular for the 2.0.0-MI2 release).
Does it really mean cyclops-react ReactiveSeq is 30% faster than IxJava? In practice probably not— there are a lot of operators in each, and a lot of potential remaining enhancements that could be made. (Equally important is that there are potentially also a lot of design trade-offs that could be made, and while I’ve found this benchmark genuinely very useful, making the wrong trade-offs to score higher on a single benchmark would be very bad indeed).
The code for my benchmark implementation is here
cyclops-react - An advanced, but easy to use, platform for writing functional applications in Java 8github.com
If I have missed anything, let me know and I’ll update (or submit a PR).
IdenticalToStream converts any reference to java.util.stream.Stream to a ReactiveSeq implementation. ReactiveSeq in cyclops-react 2.x has 3 implementations — StreamX, OneShotStreamX and FutureStreamImpl. We are using the first StreamX which has it’s own custom, sequential, synchronous Streaming engine. The goal of this benchmark is to be an exact conversion of the ShakespearePlaysScrabbleWithStreams benchmark.
ScanLeftTakeRight is the cyclops-react benchmark created by David Karnok, IxScanLeftTakeRight is intended to be the direct conversion of this benchmark to IxJava which is the Rx world’s analogue of StreamX.