Use Stream API simpler (or don’t use it at all)

Tagir Valeev
7 min readSep 9, 2017

--

Java 8 Stream API allows programmers to express their thoughts in much less code than before. However it appears that even using Stream API some developers write longer than necessary. Sometimes this makes code not only more confusing and hard to understand, but also less performant. Often it’s unclear why people do so. Probably they have read only a small part of documentation and are not aware of other Stream API features. Or probably they did not read any documentation at all, just saw some sample and wanted to do something similar. Sometimes the code resembles an old joke about the problem reduced to the already solved one.

Here I collected the code samples which I encountered in practice. Hopefully this post will help people writing code which will be a little more beautiful and run a little faster. Good IDE can warn you about many of these things, but remember that no IDE will replace your head.

1. Stream created from collection without intermediate operations is usually redundant

If you have no operations like map or filter, usually you don't need a stream.

1.1. collection.stream().forEach()

Want to do something for every collection element? Great. Why do you need a stream for this? Write simply collection.forEach(). In most of the cases it’s the same, but shorter and produces less garbage. Some people fear that there’s some difference in functionality, but cannot explain it. I’ve heard something like “forEach does not guarantee an order”. Yes, the stream forEach does not guarantee it (by specification, practically it's ordered if not parallelized), but collection forEach guarantees it for ordered collections. If you use stream.forEach(), then you don't care about order, so you will be fine if the order guarantee appears. I know only one difference in Java standard library: synchronized collections, created via Collections.synchronizedXyz(). In this case collection.forEach() synchronizes the whole operation while collection.stream().forEach() does not synchronize anything. If you are using the synchornized collection, you likely want the synchronization, so removing stream() will make things better.

1.2. collection.stream().collect(Collectors.toList())

Want to transform some collection into a list? Fine. You can do it since Java 1.2: new ArrayList<>(collection) (okay, okay, there were no generics before Java 5 and no diamond operator before Java 7). This is not only shorter, but also faster and, again, produces less garbage. Sometimes much less, as new ArrayList will allocate an array of proper size in advance, while stream will add elements one-by-one resizing the array when necessary. Similarly, instead of stream().collect(toSet()) use new HashSet<>(), and stream().collect(toCollection(TreeSet::new)) should be replaced with new TreeSet<>().

1.3. collection.stream().toArray(String[]::new)

A fancy new way of transforming a collection into an array is not better than good old collection.toArray(new String[0]). Again, here you introduce less abstractions, thus this conversion will likely be more efficient. At least you don't need a Stream object.

1.4. collection.stream().max(Comparator.naturalOrder()).get()

There’s nice method Collections.max, but unfortunately many people forget about it for some reason. A Collections.max(collection) call will do the same, but producing less garbage. If you have your own comparator, use Collections.max(collection, comparator). The Collections.max() method may not be suitable if you want to handle empty collection in some different way: a collection.stream().max(comparator).orElse(null) call chain looks better than collection.isEmpty() ? null : Collections.max(collection, comparator).

1.5. collection.stream().count()

This is just bad: there is collection.size()! In Java 9 count() will work quite fast, but in Java 8 this call always enumerates the whole collection even if the size is obviously known. Don't do this.

1.6. collection.stream.anyMatch(foo::equals)

Want to check whether collection contains an element? Don’t forget about good old collection.contains(foo). This is not only shorter, but could be much faster if your collection is a Set. You will actually need an anyMatch though if your predicate is more complex.

2. Looking for an element

2.1. stream.filter(condition).findFirst().isPresent()

Such code appears surprisingly often. So you want to filter collection by condition, find the first element and check if it exists. There’s a special method for this purpose: stream.anyMatch(condition). Why you need an Optional?

2.2. !stream.anyMatch(condition)

Some people may argue, but I think that a dedicated method stream.noneMatch(condition) looks more expressive. However if the condition contains a negation as well like !stream.anyMatch(x -> !condition(x)), then without doubts a dedicated method is better stream.allMatch(x -> condition(x)). The one who reads the code will thank you.

2.3. stream.map(condition).anyMatch(b -> b)

This is actually strange, but people write it for some reason. If you see this, just know that it’s simply stream.anyMatch(condition). Some variations of this case include stream.map(condition).noneMatch(Boolean::booleanValue), stream.map(condition).allMatch(Boolean.TRUE::equals) or stream.filter(condition).anyMatch(b -> true).

3. Stream creation

3.1. Collections.emptyList().stream()

Do you want an empty stream? Why not. Just use a special method Stream.empty(). The performance is the same, but this is shorter and simpler. Same for emptySet.

3.2. Collections.singleton(x).stream()

Similarly if you need a one-element stream, just use Stream.of(x). No difference between singleton and singletonList: if your stream contains one element, nobody cares whether it's ordered or not.

3.3. Arrays.asList(array).stream()

Need a stream of array? Some people do this, despite Arrays.stream(array) or Stream.of(array) will do it better. To stream several explicit elements don't use Arrays.asList(x, y, z).stream(). A Stream.of(x, y, z) is fine. Similar case is EnumSet.of(x, y, z).stream(). If you need a stream, don't create an intermediate collection, just create a stream.

3.4. Collections.nCopies(N, “ignored”).stream().map(ignored -> new MyObject())

Need a stream of N identical objects? Then nCopies() is your choice. However if you need a stream of N objects created in the same way (but not identical), it's much more beautiful (and a little bit more performant) to use Stream.generate(() -> new MyObject()).limit(N).

3.5. IntStream.range(from, to).mapToObj(idx -> array[idx])

Want to stream an array slice? There is a dedicated method Arrays.stream(array, from, to). Again, shorter and less garbage. As an additional bonus, now your array is not captured in lambda, so it's not required to have it effectively-final. Of course if from == 0 and to == array.length, then you just need Arrays.stream(array). In this case the code will look better even if mapToObj contains more complex transformation. E.g. IntStream.range(0, strings.length).mapToObj(idx -> strings[idx].trim()) can be easily converted into Arrays.stream(strings).map(String::trim).

More tricky case: IntStream.range(0, Math.min(array.length, max)).mapToObj(idx -> array[idx]). After a thinking a little, one can see that it's just Arrays.stream(array).limit(max).

4. Unnecessary or too complex collectors

Sometimes people learn collectors, and try to do everything using collectors. But you don’t need them always.

4.1. stream.collect(Collectors.counting())

Some collectors are designed to be used as downstream in cascaded operations like groupingBy. The counting() collector is one of a kind. Why not just writing stream.count()? Remember that in Java 9 count() is fast when size is known in advance, but any collector will enumerate all the objects in any case. In Java 8 the counting() collector also boxes every value for no reason (I fixed this in Java 9). Similarly don't use directly maxBy(), minBy() collectors (there are max() and min() terminal operations). Instead of reducing() use reduce() or map().reduce(). Instead of mapping() use intermediate map(), then downstream collector directly. Java 9 adds filtering() and flatMapping() collectors which also duplicate intermediate operations.

4.2. groupingBy(classifier, collectingAndThen(maxBy(comparator), Optional::get))

Often people want to group elements by classifier, selecting maximal element in each group. It’s pretty straightforward in SQL: SELECT classifier, MAX(...) FROM ... GROUP BY classifier. Probably having SQL experience, developers try to use the groupingBy to solve this problem with Stream API. Looks like this should work: groupingBy(classifier, maxBy(comparator)), but the maxBy collector returns Optional, so we will have Map<K, Optional<V>>. We don't need an optional here, because every group contains at least one element, so this Optional is never empty. To unwrap it developers add ugly collectingAndThen step, and the whole construction starts looking monstrous.

However if you step back, you can see that groupingBy is unnecessary here. Another great collector called toMap suits perfectly. We just want to collect elements to the Map, where the key is our classifier and the value is the element itself. If elements have duplicating key, we resolve the collision taking the bigger one. For this purpose there's a special method BinaryOperator.maxBy(comparator), which you can import statically instead of maxBy collector. Finally we have: toMap(classifier, identity(), maxBy(comparator)). This version is also faster as it does not require non-trivial finisher.

If you want to use groupingBy, and your downstream collector is maxBy, minBy or reducing (probably with intermediate mapping), take a look into toMap collector, it may make the things easier.

5. Do not count if you don’t need it

5.1. listOfLists.stream().flatMap(List::stream).count()

This sample is similar to 1.5. The goal is to calculate the total number of elements in nested collections. Seems pretty logical: we flatten the nested collections into single stream using flatMap, then enumerate. However in most of the cases the size of every nested list is already calculated and stored in some private field inside that list which is readily accessible via size() call. Simple modification may greatly improve the performance: listOfLists.stream().mapToInt(List::size).sum(). If you expect that int may overflow, mapToLong is fine too.

5.2. if(stream.filter(condition).count() > 0)

One more funny way to write stream.anyMatch(condition). However, unlike 2.1 here you've lost a short-circuitness: all elements will be enumerated and tested against the condition even if the first one already satisfies it. Similarly instead of filter(condition).count() == 0, it's better to write noneMatch(condition).

5.3. if(stream.count() > 2)

This case is more tricky. Now you want to know whether you have more than two elements or not. If you care about performance, probably adding stream.limit(3).count() will be better. After all, you don't care how many elements you have if there are more than 3.

6. Miscellaneous

6.1. stream.sorted(comparator).findFirst()

What’s written here? Sort the stream and take the first element. It’s the same as taking the minimal element: stream.min(comparator). Sometimes you can see stream.sorted(comparator.reversed()).findFirst(), which is the same as stream.max(comparator). Stream API implementation will not optimize this for you (though such optimization would be possible). It will actually drain the whole stream into an intermediate array, then sort it and return a first element. Without sorting you will have much better performance and much less memory pressure. And, of course, the replacement is significantly more clear.

6.2. stream.map(x -> {counter.addAndGet(x);return x;})

Some people try to create a side-effect during stream execution. Usually it’s a sign: either you are doing something wrong, or your problem is not suitable for stream API, use simple loop. However if you still want to stick with stream, use special method for side effects, called peek. Write stream.peek(counter::addAndGet). Beware, however, that Stream is not obliged to execute your operation on every stream elements. If it finds that some elements processing could be skipped (e.g. you have a short-circuiting operation), you may get surprising results.

6.3. stream.map(x -> x.getSomeIntValue()).reduce(0, (a,b) ->a+b)

Don’t forget that there are primitive streams! Here you box every int value returned by getSomeIntValue method and box every intermediate summing result. Using stream.mapToInt(x -> x.getSomeIntValue()).sum()you will have better performance, less garbage, shorter and cleaner code.

That’s all for now. Please write in the comments if you’ve encountered other strange or inefficient ways of using Stream API.

--

--