The little gems of Scala standard library

Published in

Wix Engineering

6 min readApr 18, 2019

Scala programming language has a really rich standard library, especially when it comes to collection related functionality. There are many features that are often overlooked, but if applied, can greatly simplify your code. In this post I will try to introduce some of the lesser known, but really useful methods, with the goal of writing cleaner, more robust and efficient code.

Let’s start by building ourselves a small sequence of integers for experimenting:

val xs = Seq(4, 5, 2, -1, -3, 4)

It will be used in most of the examples in this post. You can try all of the code snippets presented here in Scala worksheet of IntelliJ IDEA or Scala IDE for Eclipse (or online, using Scastie).

And, here we go (in no particular order):

partition

partition allows to split a collection into two collections based on a given predicate. So instead of:

val even = xs.filter(_ % 2 == 0)
val odd = xs.filterNot(_ % 2 == 0)

you can write:

val (even, odd) = xs.partition(_ % 2 == 0)

It is cleaner, has less room for errors, and does the job in one pass over the original collection. All you need to remember is that in the resulting Tuple, the first element contains all the items that match your predicate, and the second one— the ones that do not.

minBy / maxBy

You might be tempted to sort the whole collection just to get the smallest (or the largest) value of it. This is a very redundant and inefficient way of getting the job done.

For example, let’s say you want to get the number that has the smallest absolute value. Instead of:

val minByAbs = xs.sortBy(Math.abs).head

You can write:

val minByAbs = xs.minBy(Math.abs)

In fact, IntelliJ IDEA suggests you such refactoring automatically. Unless you go nuts and do something like

val maxByAbs = xs.sortBy(Math.abs).reverse.head

in which case even IntelliJ will not be able to help you…

Just keep in mind that both minBy and maxBy would throw an UnsupportedOperationException in case they are called on an empty collection (and so would min and max). head and last, however, throw aNoSuchElementException in those cases (which, strictly speaking, makes the expressions above non equivalent).

collect / collectFirst

Sometimes you want to both filter the collection and perform some mapping on its values. For example:

val doublesOfPositive = xs.filter(_ > 0).map(2 * _)

This can be nicely expressed with collect (if you need to filter and then process) or collectFirst (if you need to process just the first value matching the predicate):

val doublesOfPositive = xs.collect {
  case x if x > 0 => 
    2 * x
}

collectFirst has the same syntax, but it returns an Option (which is None in case there were no values matching the predicate) and is safe to use on empty collections (just like find).

splitAt

splitAt is a nice way to split a collection into two contiguous parts on the given index. So instead of

val leftHalf = xs.take(3)
val rightHalf = xs.drop(3)

you can write:

val (leftHalf, rightHalf) = xs.splitAt(3)

This is similar to partition, the difference is that you split by index and not by some boolean predicate. Note: it is safe to call splitAt with an index exceeding the size of your collection. In such case, you will get the entire original collection in the leftHalf, and the rightHalf will be empty.

grouped

grouped(n) splits the collection into collections with n elements each (the last one may contain less than n elements, depending on the size of the original collection):

xs.grouped(3).toList // List(List(4, 5, 2), List(-1, -3, 4))

For example, it can be used to process the data in batches of limited size. Note: grouped returns an iterator over resulting collections, so I had to invoke toList in order to force its evaluation and get the actual result.

count

Quite often you need to count the number of elements in a collection that are matching some predicate. Instead of filtering the matching elements and then taking the size of resulting collection, as in:

val evenCount = xs.filter(_ % 2 == 0).size

You can write:

val evenCount = xs.count(_ % 2 == 0)

(this is also suggested as an automatic refactoring by IntelliJ).

exists

Need to check for the existence of some element that matches the given predicate? Instead of

xs.find(_ % 42 == 0) != None

xs.count(_ % 42 == 0) > 0

You can write:

xs.exists(_ % 42 == 0)

(or let IntelliJ fix it for you, again).

last

last does just that — returns the last element of the collection, so no need for .reverse.head (or for accessing the last element by the index).

And for the dessert, my personal favourite:

indexOfSlice / lastIndexOfSlice

This mostly concerns performance (in terms of Big O notation). Let’s say you need to find the given substring within some string. If you are an experienced Java developer, you’ll tell immediately: use String.indexOf(substring). Sure, being a method of java.lang.String, it is also available to us in Scala. But wait a second! What if I (or some malicious user of your publicly available API) will try to provide you with some “inconvenient” inputs? For example:

val string = "a" * 2000000
val substring = "a" * 1000000 + "b"

string.indexOf(substring)

(in case you wonder, “foo” * x is a nice way of getting string “foo” repeated x times in Scala).

Just don’t hold your breath while trying to run this code! You will lose it long before the code completes. This happens because Java implementation of String.indexOf uses a naive algorithm of substring matching — for each occurrence of first character of search substring ('a') it tries to compare each subsequent character of the substring until a mismatch is found (or the substring is fully matched). This particular input that I gave is designed to reveal the quadratic (O(nm)) worst time complexity (n is the length of the main string, and m — the length of the search substring) of this implementation: one million successful matches of character'a' are done before the first mismatch on 'b' is encountered. And this is repeated roughly a million times! But how can Scala standard library help us here? Let’s try:

val string = "a" * 2000000
val substring = "a" * 1000000 + "b"

string.indexOfSlice(substring)

This code does the same, but it finishes immediately! The only change is replacement of indexOf with indexOfSlice (the method that comes from SeqLike trait in Scala standard library). But why does it perform so much better on this input? If we dig deeper, under the hood of this method we can find the implementation of Knuth–Morris–Pratt algorithm which is a really clever algorithm for substring search. It has a linear (O(n + m)) complexity even in the worst case (meaning — for any input possible). If you try to benchmark Java indexOf vs Scala indesOfSlice on random inputs, I’m sure Java indexOf would win. Simply because it specializes on Strings, while Scala indexOfSlice supports sequences of generic type. Also, the implementation of the naive search algorithm has less overhead. However, indexOfSlice has advantage whenever you have to be defensive against inputs that you do not control — for example, when your users can provide you with sufficiently large strings and trigger the search on them. Also, linear vs quadratic complexity really matters in Competitive Programming.

Part of my current job is introducing new team members to Scala. One of the things I ask them to do is to go over the documentation of a typical Scala collection class (such as IndexedSeq) a few times, in order to learn about the methods that can be useful in a day to day work, like the ones we have just discussed. But I hope that even experienced Scala developers have learned something new from this blog post.

Do you think I have missed some useful methods of Scala collections? Please share your favourite ones in the comments!

The little gems of Scala standard library

partition

minBy / maxBy

collect / collectFirst

splitAt

grouped

count

exists

last

indexOfSlice / lastIndexOfSlice

Written by Linas Medžiūnas