Slow like a stream, fast like a loop
Or why is your Java becoming slower. Or why you should think about the code you type. Or why you shouldn’t let fads affect your coding.
Together with lambdas, streams were the one of a few new cool features brought to us by Java 8. The first thing that made stream attractive among the cool kids is its verbosity compared to old-fashioned loop. You could now read the code aloud to your granny. And even more excitingly, you could now pretend you were doing concurrent programming just by choosing
parellelStream() instead of
stream(). All the loops have to be replaced! Loops are evil! With this hype it was hard to find any resources about stream performance compared to loops. You don't talk against the only Java 8 cool feature.
The excitements is now long gone, and everybody is looking at upcoming Java 9, so I feel now is the right time give streams a more sober look. I believe any experienced programmer feels that there is something not quite right with streams. Why hide a complexity of a loop? And at what cost? What is so complex about the loop in the first place, the first control flow statement you learn about, that needs hiding? Since when are one-liners considered more readable? How you debug streams? What is the actual cost of constructing tones of temporary objects, using lambdas (remember: lambdas are just a temporary instantiations of a nested class), implementing control statement with virtual method invocations, drowning deep into the stack; compared to a few byte-codes generated by the loop statement? JIT will take care of it you say? JIT takes care of loops and does a good job there. It doesn’t do miracles with method invocations. If you just admit that it is possible that streams & lambdas can produce significant computational overhead then you will easy spot them in you profiler output. If you were foolish enough (like me) to use them if your hot code, that is. Because you do profiling, right?
I wrote a benchmark using JMH. I’m measuring the performance of a simple operation on an
java.lang.ArrayList: sequential search for a specific value which is not present in the list. Using Java 8u131 and Java 9 current mercurial tip. Run on 8 Core Intel i7 CPU @ 2.50GHz.
The graph shows throughput: number of operations per second. More is better.
I know I could get much better results if I used indexed-based loop instead of enhanced loop (which unfolds into iterator-based loop) but that would be unfair comparison. Streams and enhanced loops work an any kind of collections while index-based loops work only for random access ones.
The only advantage of a stream is in Java 9, when used on a large list, in parallel mode. But event that very is arguable, given it consumes 8 times more power. Also, Java 9 in general shows slightly different behavior: sometimes slower, sometimes faster. That could be because of different GC, but it may be because of something else as well, I wouldn’t speculate.
Now ask yourself:
- Do you care about performance?
- Are you willing to sacrifice some code-coolness because of performance?
- What is the average list size you are performing streaming operation on?
- How likely is for the list to be empty, or with just a few elements?
- Is cached data lookup the backbone of your business logic?
It turns out I often deal with applications which do nothing but lookups on small or empty in-memory lists. Got nice performance boosts just by rewriting streams (back) to loops.