Benchmarking Kotlin Sequences

AJ Alt
AJ Alt
Sep 19, 2018 · 5 min read

Kotlin 1.2.70 was just released. If you’ve upgraded to it, you might have noticed some new IntelliJ warnings telling you that “Call chain on collection should be converted into ‘Sequence’”. Applying the suggestion adds a couple of extra function calls to the call chain. Is this extra code useful?

The release announcement says this:

Using Sequences helps avoid unnecessary temporary allocations overhead and, may significantly improve performance of complex processing pipelines

“significantly improve performance”? Claims like that make me skeptical when they don’t have any measurement to back them up, so let’s figure out if it’s true.

Before we dig in, what do you think the answer is? Do you think that Sequences are always faster? Always slower? is there some size of list or number of chained operations where you should start using Sequences? Let’s find out.

What are Sequences?

If you’re already familiar with Kotlin’s collection extensions and Sequences, feel free to skip ahead to the next section.

The Kotlin standard library includes a bunch of useful extension functions that you can use transform collections in a functional style. Every time you call a collection extension, it immediately runs over the entire collection. Operations like filter and map create a new list every time they run. If you chain multiple calls together, each one will create an intermediate list that is thrown away after the next step.

Sequences were created to allow you to perform the same operations lazily, so that no intermediate lists need to be created. Instead, intermediate iterators are created, and all the iterators run on each element of the input before adding the final result to a list. Lazy Sequences are especially useful if you don’t need to output an entire list, but instead just want to find a certain element.

For example, if you use map followed by find. With a Sequence, the map transform will only need to be applied to elements until the find predicate matches. The rest of the input is ignored, since a match is already found. Without Sequences, the entire input will be mapped to a new list, and find will run on that.

Benchmarking setup

I wrote a large suite of benchmarks using JMH. Since the examples in release announcement and YouTrack issue focus on filter and map, I did as well.

I tested performance of various numbers of chained operations, with and without Sequences at a variety of list sizes. To focus on the overhead of the Sequence calls, none of the operations change the list.

For example, here’s how I tested a chain of four operations without a Sequence:

list.filter { true }.map { it }.filter { true }.map { it }

and with a Sequence:

list.asSequence()
.filter { true }.map { it }.filter { true }.map { it }
.toList()

Longer “chain lengths” add more pairs of filter and map.

You can find the benchmark code here.

Results

Let’s try to answer the question I asked in the intro: is there some length of list or number of chained operations where Sequences are faster or slower?

Here’s a plot with number of chained operations on the x axis, list length on the y axis, and total running time on the z.

Time to perform a number of operations on a list of integers. All three axes are log-scale. Lower is better. “chain length” is the number of filter and map operations chained together.

Even with all the overhead removed, the points are very close together. However, it seems like Sequences actually get slower as the number of operations increases. Let’s zoom in on one size of list (1 million items), so we can see the difference more easily.

Time per item to perform a number of operations on a list of 1 million ints. Lower is better.

And here’s the same graph, but with the difference between list and Sequence times:

Time per item for lists minus time per item for Sequences. Positive values mean that lists are slower, and negative values mean the Sequences are slower.

So Sequences are slightly faster for small numbers of operations, and noticeably slower with larger numbers of operations. That certainly disagrees with the common idea that allocating lists is expensive, and that Sequences would be faster.

But these results are for operations that are as fast as possible. What if your map or filter functions take slightly longer to run?

Here’s a graph or total running time to do a single filter and map, where the `map` takes 5µs to run.

list.filter { true }.map { Thread.sleep(0, 5000); it }
Total time to run with a map function that takes 5µs

In this case, the differences disappear entirely. 5µs is a small amount of time, but it’s still orders of magnitude larger than the tiny overhead from using Sequences or not.

There is one case where Sequences significantly outperform list operations: when lazyness matters. If you use a function list first or find, then the Sequence version will only run until an item is found. the list version still needs to run over the entire list.

Here’s the above graph, but with first used with a predicate that matches after the second item:

list.map { Thread.sleep(0, 5000); it }.first { it > 2 }

In this case, the Sequences clearly outperform the list version.

Conclusion

The fact that Sequences are slower for longer chains of operations is probably counter-intuitive for most people. But JVM performance often is. Modern compilers, and especially JITs, are extremely good at optimizing code in most cases. The JVM is very fast at allocating memory, and the simple inline functions used for list operations are likely good targets for optimization. On the other hand, the implementation of Sequences is comparatively complex, with multiple allocations necessary to produce the iterators that are eventually used. It’s probably the case that the JIT has a harder time with these.

But as with all performance questions, the answer is not to speculate, but to measure. Maybe you’re running on a device where allocation is more expensive, or the JIT makes different decisions. The only way to know if it matters is to measure your code.

What’s my recommendation for Sequences? Use them if lazyness matters. Otherwise do whatever makes you happy, because they aren’t going to make your code faster.