Collections and sequences in Kotlin
Working with collections is a common task and the Kotlin Standard Library offers many great utility functions. It also offers two ways of working with collections based on how they’re evaluated: eagerly — with Collection
s, and lazily — with Sequence
s. Continue reading to find out what’s the difference between the two, which one you should use and when, and what the performance implications of each are.
Collections vs sequences
The difference between eager and lazy evaluation lies in when each transformation on the collection is performed.
Collections are eagerly evaluated — each operation is performed when it’s called and the result of the operation is stored in a new collection. The transformations on collections are inline functions. For example, looking at how map
is implemented, we can see that it’s an inline
function, that creates a new ArrayList
:
public inline fun <T, R> Iterable<T>.map(transform: (T) -> R): List<R> {
return mapTo(ArrayList<R>(collectionSizeOrDefault(10)), transform)
}
Sequences are lazily evaluated. They have two types of operations: intermediate and terminal. Intermediate operations are not performed on the spot; they’re just stored. Only when a terminal operation is called, the intermediate operations are triggered on each element in a row and finally, the terminal operation is applied. Intermediate operations (like map
, distinct
, groupBy
etc) return another sequence whereas terminal operations (like first
, toList
, count
etc) don’t.
Sequences don’t hold a reference to the items of the collection. They’re created based on the iterator of the original collection and keep a reference to all the intermediate operations that need to be performed.
Unlike transformations on collections, intermediate transformations on sequences are not inline functions — inline functions cannot be stored and sequences need to store them. Looking at how an intermediate operation like map
is implemented, we can see that the transform function is kept in a new instance of a Sequence
:
public fun <T, R> Sequence<T>.map(transform: (T) -> R): Sequence<R>{
return TransformingSequence(this, transform)
}
A terminal operation, like first
, iterates through the elements of the sequence until the predicate condition is matched.
public inline fun <T> Sequence<T>.first(predicate: (T) -> Boolean): T {
for (element in this) if (predicate(element)) return element
throw NoSuchElementException(“Sequence contains no element matching the predicate.”)
}
If we look at how a sequence like TransformingSequence
(used in the map
above) is implemented, we’ll see that when next is called on the sequence iterator, the transformation stored is also applied.
internal class TransformingIndexedSequence<T, R>
constructor(private val sequence: Sequence<T>, private val transformer: (Int, T) -> R) : Sequence<R> {override fun iterator(): Iterator<R> = object : Iterator<R> {
…
override fun next(): R {
return transformer(checkIndexOverflow(index++), iterator.next())
}
…
}
Independent on whether you’re using collections or sequences, the Kotlin Standard Library offers quite a wide range of operations for both, like find
, filter
, groupBy
and others. Make sure you check them out, before implementing your own version of these.
Collections and sequences
Let’s say that we have a list of objects of different shapes. We want to make the shapes yellow and then take the first square shape.
Let’s see how and when each operation is applied for collections and when for sequences
Collections
map
is called — a new ArrayList is created. We iterate through all items of the initial collection, transform it by copying the original object and changing the color, then add it to the new list.first
is called — we iterate through each item until the first square is found
Sequences
asSequence
— a sequence is created based on the Iterator of the original collectionmap
is called — the transformation is added to the list of operations needed to be performed by the sequence but the operation is NOT performedfirst
is called — this is a terminal operation, so, all the intermediate operations are triggered, on each element of the collection. We iterate through the initial collection applying map and then first on each of them. Since the condition from first is satisfied by the 2nd element, then we no longer apply the map on the rest of the collection.
When working with sequences no intermediate collection is created and since items are evaluated one by one, map is only performed on some of the inputs.
Performance
Order of transformations
Independent of whether you’re using collections or sequences, the order of transformations matters. In the example above, first
doesn’t need to happen after map
since it’s not a consequence of the map
transformation. If we reverse the order of our business logic and call first on the collection and then transform the result, then we only create one new object — the yellow square. When using sequences — we avoid creating 2 new objects, when using collections, we avoid creating an entire new list.
Because terminal operations can finish processing early, and intermediate operations are evaluated lazily, sequences can, in some cases, help you avoid doing unnecessary work compared to collections. Make sure you always check the order of the transformations and the dependencies between them!
Inlining and large data sets consequences
Collection operations use inline functions, so the bytecode of the operation, together with the bytecode of the lambda passed to it will be inlined. Sequences don’t use inline functions, therefore, new Function
objects are created for each operation.
On the other hand, collections create a new list for every transformation while sequences just keep a reference to the transformation function.
When working with small collections, with 1–2 operators, these differences don’t have big implications so working with collections should be ok. But, when working with large lists the intermediate collection creation can become expensive; in such cases, use sequences.
Unfortunately, I’m not aware of any benchmarking study done that would help us get a better understanding on how the performance of collections vs sequences is affected with different sizes of collections or operation chains.
Collections eagerly evaluate your data while sequences do so lazily. Depending on the size of your data, pick the one that fits best: collections — for small lists or sequences — for larger ones, and pay attention to the order of the transformations.