The Scala Collections Library

Muhammad Tabaza
Pragma
Published in
9 min readSep 14, 2019

--

One of my favorite features of Scala is the amazingly rich standard collections library. Normally, languages try not to make any assumptions about your preferences of the algorithms used to implement certain operations on data structures, so they don’t provide implementations for these operations at all. Consequently, you either implement your own structures/operations and endure the debt of maintaining them, or you use an external library that implements its own data structures. But what’s the point of having a standard library of collections if the mismatch between everyone’s most basic data representations is inevitable?

Luckily, the Scala collections library contains a plethora of data structures and generic operations that are hard to find in the standard libraries of other languages. Combined with the expressive nature and type safety of Scala, manipulating complex data is a blast.

This article assumes the reader is using Scala 2.13.0

Overview of scala.collection

The scala.collection package contains definitions of many abstract structures, such as Seq (sequence) and Map , and Iterable, among some generic operations. It also contains mutable and immutable sub-packages which contain concrete implementations of these abstract data structures. For example, immutable.List is an extension of Seq, and immutable.HashMap is an extension of Map. We’ll discuss these types of collections later on.

Iterable

Iterable is the base of all collections, which has one abstract method: iterator, and one type argument: A , the type of elements in the collection. It defines many useful operations such as map, filter, and reduce, so you can expect these operations to be present for all collections. This makes it really easy to define functions of Iterables which can operate on many types of collections. For example:

scala> // Takes an iterable of Stringsscala> def longerThan5(it: Iterable[String]) = it.filter(_.length>5)
longerThan5: (it: Iterable[String])Iterable[String]
scala> // Can be applied to a List[String]scala> longerThan5(List("John","Alice","Edward","Victoria"))
res0: Iterable[String] = List(Edward, Victoria)
scala> // Can be applied to a Set[String]scala> longerThan5(Set("John","Alice","Edward","Victoria"))
res1: Iterable[String] = Set(Edward, Victoria)

The type argument A of Iterable is covariant, meaning that Iterable[Int] is a subtype of Iterable[Number] for example.

Values of type Iterable can be created using Iterable.apply:

scala> val iterable = Iterable(1,2,3)
iterable: Iterable[Int] = List(1, 2, 3)

Notice how the actual value of iterable has the type List[Int]. This is the default type that is given to values constructed using Iterable.apply (since Iterable is abstract, and List is a concrete implementation of it.) We’ll take a look at Lists shortly.

We can iterate over any Iterables using for:

scala> for(n <- Iterable(1,2,3)) print(s"$n ")
1 2 3

You can also use for comprehensions to transform and filter the values in the collection and return the resulting collection:

scala> for(n <- Iterable(1,2,3); if n % 2 == 0) yield s"$n is even!"
res5: Iterable[String] = List(2 is even!)

for comprehensions are actually syntactic sugar for chains of map, filter, and foreach calls. So the code above is equivalent to:

scala> Iterable(1,2,3).filter(_ % 2 == 0).map(n => s"$n is even!")
res6: Iterable[String] = List(2 is even!)

Check out function shorthand syntax if _ % 2 == 0 looks alien to you.

Any type that implements foreach can be iterated over using for. This gives you the ability to iterate over and transform your own types using very clean syntax. You can also enable the use of yield and if guards if you implement map and filter methods for the type.

You can concatenate Iterables using the ++ operator:

scala> Iterable(1,2,3) ++ Iterable(4,5,6)
res0: Iterable[Int] = List(1, 2, 3, 4, 5, 6)

Iterables also have head, tail and isEmpty methods that make recursion pretty convenient:

scala> val ns = Iterable(1,2,3)
ns: Iterable[Int] = List(1, 2, 3)
scala> ns.head
res0: Int = 1
scala> ns.tail
res1: Iterable[Int] = List(2, 3)
scala> def sum(ns: Iterable[Int]): Int =
| if(ns.isEmpty) 0 else ns.head + sum(ns.tail)
sum: (ns: Iterable[Int])Int

Iterables actually have a sum method that sums any collection of numbers. How convenient is that?

When using head and tail (and some other methods,) you should keep in mind that the collection might be empty. Calling head or tail on an empty collection results in a NoSuchElementException or an UnsupportedOperationException respectively. A type-safe way to get around this problem is to use headOption and match its result to know whether to call tail or not. Calling tail on a collection with one element will result with an empty collection:

scala> def sum(ns: Iterable[Int]): Int =
| ns.headOption match {
| case None => 0
| case Some(n) => n + sum(ns.tail)
| }
sum: (ns: Iterable[Int])Int

The Option type encodes a value that might not exist, such as the head of an empty collection in this example.

This pattern of fooOption appears in many places in the collections API. For example, the reduce method assumes the first element of the collection to be the initial value for the accumulator, which is returned at the end:

scala> Iterable(1,2,3).reduce((a, b) => a + b)
res0: Int = 6

The definition of reduce is similar to:

def reduce[A](it: Iterable[A], op: (A, A) => A): A = 
if(it.size == 1) it.head
else reduce(Iterable(op(it.head,it.tail.head))++it.tail.tail,op)

Which doesn’t take into account that the collection can be empty. The reduceOption method returns None if the collection is empty. The fold method is similar to reduce, but it takes the initial value of the accumulator as an argument:

scala> Iterable(1,2,3).fold(1)((a, b) => a * b)
res0: Int = 6

Iterables also support drop, take, dropWhile, and takeWhile:

scala> val ns = Iterable(1,5,10,12,20,50)
ns: Iterable[Int] = List(1, 5, 10, 12, 20, 50)
scala> ns.take(2)
res0: Iterable[Int] = List(1, 5)
scala> ns.drop(2)
res1: Iterable[Int] = List(10, 12, 20, 50)
scala> ns.takeWhile(_ < 20)
res2: Iterable[Int] = List(1, 5, 10, 12)
scala> ns.dropWhile(_ < 20)
res3: Iterable[Int] = List(20, 50)

Iterables have many convenience methods, such as groupBy, which groups elements of the collection based on some attribute (user age in this example):

scala> case class User(name: String, age: Int)
defined class User
scala> val users = Iterable(
| User("John", 21),
| User("Jane", 30),
| User("Alice", 21))
users: Iterable[User] = List(User(John,21), User(Jane,30), User(Alice,21))
scala> users.groupBy(_.age)
res1: scala.collection.immutable.Map[Int,Iterable[User]] = HashMap(21 -> List(User(John,21), User(Alice,21)),
30 -> List(User(Jane,30)))

We can’t cover all the methods of Iterable (or any subtype of it) in this article, but I encourage you to use the API documentation for reference.

Seq

The Seq type represents an order-preserving sequence of elements. Any subtype of Seq has an apply method that takes an integer index (starting at zero) and returns the element at that location:

scala> val s = Seq(1, 2, 3, 1, 2, 3)
s: Seq[Int] = List(1, 2, 3, 1, 2, 3)
scala> s(2)
res0: Int = 3
scala> // But be careful not to access elements out of boundsscala> s(100)
java.lang.IndexOutOfBoundsException: 100
at scala.collection.LinearSeqOps.apply(LinearSeq.scala:118)
at scala.collection.LinearSeqOps.apply$(LinearSeq.scala:115)
at scala.collection.immutable.List.apply(List.scala:82)
... 36 elided
scala> // Use lift to be safescala> s.lift(100)
res2: Option[Int] = None

All Iterables have a size method, which returns the number of elements in the Iterable. Seqs have another method: length, which does the same thing. Seqs define the size method as an alias to the length method, so they’re exactly the same thing:

scala> val s = Seq(1, 2, 3)
s: Seq[Int] = List(1, 2, 3)
scala> s.length
res0: Int = 3
scala> s.size
res1: Int = 3

You can update the element at a particular index:

scala> Seq('a','b','c').updated(1,'z')
res2: Seq[Char] = List(a, z, c)

Seqs can be sorted if there’s an implementation of Ordering for its elements in scope:

scala> Seq(10,2,4,8).sorted
res0: Seq[Int] = List(2, 4, 8, 10)
scala> Seq("bananas","apples","oranges").sorted
res1: Seq[String] = List(apples, bananas, oranges)

You can append and prepend elements to a Seq:

scala> val seq = Seq(1,2,3)
seq: Seq[Int] = List(1, 2, 3)
scala> 0 +: seq
res0: Seq[Int] = List(0, 1, 2, 3)
scala> seq :+ 4
res1: Seq[Int] = List(1, 2, 3, 4)
scala> seq ++ Seq(4,5,6)
res2: Seq[Int] = List(1, 2, 3, 4, 5, 6)
scala> Seq(-2,-1,0) ++: seq
res3: Seq[Int] = List(-2, -1, 0, 1, 2, 3)

The performance of these operations depends on the concrete implementation of Seq (the actual value.)Lists are implemented as linked lists in Scala, so they support constant-time prepending of single elements. On the other hand, they take O(n1) time to prepend a list of n1 elements to another list. Accessing elements by index in a List of n elements also takes O(n) time. It’s useful to know this to be able to determine the best data structure to use.

Vectors are Seqs that support constant-time random access/updates of elements, so they’re a good choice if you do a lot more than just iterating over the sequence and prepending elements to the collection.

List

Lists are implemented as cons lists, which can be defined as:

sealed trait ConsList[+A]case class Cons[A](head: A, tail: ConsList[A]) extends ConsList[A]case object Empty extends ConsList[Nothing]

And values of ConsList can be constructed like so:

scala> Cons(1, Cons(2, Cons(3, Empty)))
res3: Cons[Int] = Cons(1,Cons(2,Cons(3,Empty)))

But what’s cool is that Scala support syntax that allows a value constructor like Cons to be a symbol like ::, so Lists can be constructed in this fashion:

scala> 1 :: 2 :: 3 :: Nil
res4: List[Int] = List(1, 2, 3)

Where Nil represents the empty list. This is cool because it allows us to match patterns with lists, for example:

scala> def sum(xs: List[Int]): Int = xs match {
| case Nil => 0
| case head :: tail => head + sum(tail)
| }
sum: (xs: List[Int])Int

Of course, you can always just use List.apply to construct and match values:

scala> List(1,2,3)
res5: List[Int] = List(1, 2, 3)

LazyList

A LazyList is a List that lazily evaluates its elements. Which means that elements of a LazyList would never be evaluated unless they are used:

scala> LazyList(1,2,3)
res0: scala.collection.immutable.LazyList[Int] =
LazyList(<not computed>)
scala> LazyList(1+2<3,false,Seq.range(1,100000).length>300)
res1: scala.collection.immutable.LazyList[Boolean] =
LazyList(<not computed>)
scala> // But will be evaluated when neededscala> LazyList(1,2,3).sum
res3: Int = 6

Set

A Set is a collection of unique elements that can be constructed using Set.apply:

scala> Set(1,2,3,1,2,3)
res0: scala.collection.immutable.Set[Int] = Set(1, 2, 3)

Sets have an apply method that takes an element and returns whether the element is a member of the set or not:

scala> val ns = Set(10,20,30)
ns: scala.collection.immutable.Set[Int] = Set(10, 20, 30)
scala> ns(20)
res1: Boolean = true
scala> ns(1)
res2: Boolean = false

Performing any of the Iterable operations will preserve the uniqueness of the elements in the set. Additionally, Sets support + and methods that perform element insertion and extraction respectively:

scala> Set(1,2,3) + 4
res0: scala.collection.immutable.Set[Int] = Set(1, 2, 3, 4)
scala> Set(1,2,3) - 3
res2: scala.collection.immutable.Set[Int] = Set(1, 2)

They also support union and intersect methods:

scala> Set(1,2).union(Set(2,3))
res4: scala.collection.immutable.Set[Int] = Set(1, 2, 3)
scala> Set(1,2).intersect(Set(2,3))
res5: scala.collection.immutable.Set[Int] = Set(2)

Sets don not preserve the order in which elements were inserted. But you can use ListSet if order matters to you.

See also: HashSet, TreeSet.

Map

A Map is a set of key-value pairs that can be created using Map.apply:

scala> Map(5 -> "Hello", 11 -> "Collections")
res0: scala.collection.immutable.Map[Int,String] =
Map(5 -> Hello, 11 -> Collections)

-> is actually just a method that returns a tuple of the two operands, so 'a' -> 1 returns ('a', 1).

You can use the apply method to get the value of a key:

scala> val lengths = Map(5 -> "Hello", 11 -> "Collections")
lengths: scala.collection.immutable.Map[Int,String] = Map(5 -> Hello, 11 -> Collections)
scala> lengths(5)
res2: String = Hello
scala> lengths(100)
java.util.NoSuchElementException: key not found: 100
at scala.collection.immutable.Map$Map2.apply(Map.scala:274)
... 36 elided

Similarly to Seqs, you can use lift to avoid NoSuchElementExceptions:

scala> lengths.lift(11)
res6: Option[String] = Some(Collections)
scala> lengths.lift(99999)
res7: Option[String] = None

Maps also support + and operators:

scala> lengths + (4 -> "Maps")
res9: scala.collection.immutable.Map[Int,String] =
Map(5 -> Hello, 11 -> Collections, 4 -> Maps)
scala> lengths - 5
res11: scala.collection.immutable.Map[Int,String] =
Map(11 -> Collections)

Adding a new key-value pair will overwrite any existing pairs with the same key:

scala> lengths + (5 -> "Scala")
res13: scala.collection.immutable.Map[Int,String] =
Map(5 -> Scala, 11 -> Collections)

You can iterate over the key-value pairs of a map using for(or any method on Iterable):

scala> for((k,v) <- lengths) println(s"$k\t->\t$v")
5 -> Hello
11 -> Collections

See also: ListMap, HashMap, TreeMap.

So far, we’ve looked at List, LazyList, Map and Set, which are all immutable collections. The other side of the collections library: scala.collection.mutable, contains data structures similar to these(plus some new ones, like Stack,) but ones you can mutate in place (without creating new collections.) We also haven’t looked at Queues, which are also in collection.immutable. It would be difficult to cover all of this in one article (or one book,) but again, I urge you to refer to the API documentation for a complete view of the collections library.

I hope you found this introduction to be helpful. My aim was to help anyone new to Scala get started using some of the most useful functionality of the collections library (and also for me to gush over some of its really cool features.)

--

--