Functional Style List Manipulation: Scala Vs Java 8 Vs Groovy

How does the Java streams API compare with Scala and Groovy.

Source: Toptal

Let’s get started.

Assume a simple Book class.

public class Book {

private String name;
private String author;
private int numberOfPages;

public Book(String name, String author, int numberOfPages) {
this.name = name;
this.author = author;
this.numberOfPages = numberOfPages;
}

public String getName() {
return name;
}

public void setName(String name) {
this.name = name;
}

public String getAuthor() {
return author;
}

public void setAuthor(String author) {
this.author = author;
}

public int getNumberOfPages() {
return numberOfPages;
}

public void setNumberOfPages(int numberOfPages) {
this.numberOfPages = numberOfPages;
}

@Override
public boolean equals(Object o) {
return name.equals(((Book) o).name);
}

@Override
public int hashCode() {
return name.hashCode();
}

@Override
public String toString() {
return this.name + " : " + this.author + " : " + this.numberOfPages;
}
}

In scala

case class Book(name: String, author: String, numberOfPages: Int)

Consider following list of books as our test data

//Scala
val books = List[Book](
new Book("Clean Code", "Bob", 100),
new Book("Refactoring", "Martin", 300),
new Book("Extreme Programming", "Bob", 200),
new Book("TDD", "Kent", 250)
)
//Java
List<Book> books = new ArrayList<Book>() {
{
add(new Book("Clean Code", "Bob", 100));
add(new Book("Refactoring", "Martin", 300));
add(new Book("Extreme Programming", "Bob", 200));
add(new Book("TDD", "Kent", 250));
}
};
//Groovy
def books = [
new Book("Clean Code", "Bob", 100),
new Book("Refactoring", "Martin", 300),
new Book("Extreme Programming", "Bob", 200),
new Book("TDD", "Kent", 250)
]

Before we begin, a note on the testing frameworks used in the examples: JUnit is used to write assert statements for Java 8.

ScalaTest (http://www.scalatest.org/) for Scala and

Spock (http://spockframework.org/) for Groovy.

Let the fun begin.

1. Filter /FindAll

Find all the books written by author ‘Bob’

//Scala
val booksByBob = List(new Book("Clean Code", "Bob", 100), new Book("Extreme Programming", "Bob", 200))
assert(books.filter((book: Book) => book.author == "Bob") == booksByBob)
assert(books.filter(book => book.author == "Bob") == booksByBob)
assert(books.filter(_.author == "Bob") == booksByBob)
//Java
List<Book> booksByBob = Arrays.asList(
new Book("Clean Code", "Bob", 100),
new Book("Extreme Programming", "Bob", 200)
);
assertThat(books.stream()
.filter((Book book) -> book.getAuthor().equals("Bob"))
.collect(Collectors.toList()),
is(booksByBob));
assertThat(books.stream()
.filter(book -> book.getAuthor().equals("Bob"))
.collect(Collectors.toList())
, is(booksByBob));
//Groovy
def booksByBob = [new Book("Clean Code", "Bob", 100), new Book("Extreme Programming", "Bob", 200)]
assert books.findAll { Book book -> book.author == "Bob" } == booksByBob
assert books.findAll { book -> book.author == "Bob" } == booksByBob
assert books.findAll { it.author == "Bob" } == booksByBob

Scala: In Scala the predicate passed to the filter function can be written in multiple forms of lambda expression. Scala uses fat arrow opposed to Java8 and Groovy which use thin arrow for lambda expressions. Also note that it’s completely optional to mention the type for the lambda expression parameter.

Scala supports an implicit variable name _ for the lambda expression input parameter which makes the syntax look compact and neat.

assert(books.filter(_.author == “Bob”)

Java: In java 8 filtering can be performed using stream API.

stream API in Java 8 supports various stream operations which help in writing the code in functional style.

Like Scala and Groovy, Java 8 can infer the type for the lambada expression input parameter, hence it can be skipped. But if you have to specify the type at all, it must be enclosed in parenthesis, this makes the syntax look ugly.

Unlike Scala and Groovy, Java 8 does not have an implicit parameter name for the lambda expression. Specifying parameter name for simplest operations is an overkill.

Notice the .collect(Collectors.toList()) part. Since we have performed the filter operation on stream which returns stream and not a List, we need to convert the stream back into the List. This can be done using collect operation on stream. Collect operation takes an argument of type collector. There are many built in collectors available in Java8. You can write your own collector by implementing the Collector interface.

Groovy: Groovy has findAll method to find all the elements matching the predicate. The predicate is specified using a Closure.

Groovy closures use an implicit parameter name ‘it’ and mentioning the parameter type is completely optional.

Looking at the syntax of all the three languages, Java 8 syntax still looks a little bit of overkill.

2. Find

Find a book with name ‘TDD’

//Scala
assert(books.find(_.name == "TDD") == Option(new Book("TDD", "Kent", 250)))
//Java
assertThat(books.stream()
.filter(book -> book.getName().equals("TDD"))
.findFirst(),
is(Optional.of(new Book("TDD", "Kent", 250))));
//Groovy
assert books.find { it.name == "TDD" } == new Book("TDD", "Kent", 250)

Scala and Groovy have method named ‘find’ to find the first element matching the predicate.

Java 8 does not have a ‘find’ method with a predicate on stream for finding an element. It uses filter operation in conjunction with findFirst or findAny operations.

Stream operations are either intermediate (e.g. filter, map, sorted) or terminal (findFirst, forEach, collect etc). Intermediate operations return a stream so we can chain multiple intermediate operations without using semicolons. Terminal operations are either void or return a non-stream result.

In this example filter is an intermediate operation and findFirst is a terminal operation.

An important characteristic of intermediate operations is laziness. The intermediate operation would never be called if there is no terminal operation in the chain.

Looking at the code, it gives an impression that it filters the collection based on the predicate and then returns the first element from the filtered elements. Too many iterations, huh? But it’s not true. The stream operations in Java 8 are not evaluated horizontally but are evaluated vertically i.e. each element of the stream moved vertically down the chain. This can reduce the number of iterations significantly if the operations are performed in right order.

This is the best part of stream API, because the code can be written using a chain of operations, which naturally reveal their intent making the code more readable. We will see some examples at the end of the post.

It’s worth to note that the find method in Scala does not return a Book but returns an Option[Book]. Same is the case with Java 8 findFirst where it returns Optional[Book].

An option is used to denote an optional value. i.e. it may contain an Object or may be empty. The find operation may return a Book if matched with predicate or else returns an empty Option.

3. Limit / Take

Find only two books with at least 100 pages

//Scala
assert(books.filter { _.numberOfPages > 100 }.take(2) == List(new Book("Refactoring", "Martin", 300), new Book("Extreme Programming", "Bob", 200)))
//Java
assertThat(books.stream()
.filter(book -> book.getNumberOfPages() > 100)
.limit(2)
.collect(Collectors.toList())
, is(Arrays.asList(new Book("Refactoring", "Martin", 300), new Book("Extreme Programming", "Bob", 200))));
//Groovy
assert books.findAll { it.numberOfPages > 100 }.take(2) == [new Book("Refactoring", "Martin", 300), new Book("Extreme Programming", "Bob", 200)]

Scala and Groovy use ‘take’ method to limit the number of elements to be returned.

Java 8 has got limit operation on streams to limit the quantity of elements to be returned.

4. ForEach / Each

Print the name of each book

//Scala
books.foreach(book => println(book.name))
//Java 8
books.stream()
.map(Book::getName)
.forEach(System.out::println);
//Groovy
books.each { println(it.name) }

A double colon operator (::) also known as method reference operator can be used instead of lambda for simple expressions like:

book -> book.getName()
book -> System.out.println(book);

Refer: http://www.baeldung.com/java-8-double-colon-operator for more on double colon operator

More explicit syntax in Java 8 can also be written as:

books.stream()
.forEach(book-> System.out.println(book.getName()));

Note that in first example the operations are evaluated in vertical order, hence it takes only one iteration to print all the names and the map operation does not take an additional iteration to transform the books into the book names.

5. Map / Collect

Get all the authors

//Scala
assert(books.map(_.author) == List("Bob", "Martin", "Bob", "Kent"))
//Java
assertThat(books.stream()
.map(Book::getAuthor)
.collect(Collectors.toList()),
is(Arrays.asList("Bob", "Martin", "Bob", "Kent")));
//Groovy
assert books.collect{it.author} == ["Bob", "Martin", "Bob", "Kent"]

Map or Collect operation transforms each element of the collection. The transformation is mentioned as a function or a closure.

6.Distinct / Unique

Get names of all the distinct authors

//Scala
assert(books.map(_.author).distinct == List("Bob", "Martin", "Kent"))
//Java
assertThat(books.stream()
.map(Book::getAuthor)
.distinct()
.collect(Collectors.toList()),
is(Arrays.asList("Bob", "Martin", "Kent")));
//Groovy
assert books.collect{it.author}.unique() == ["Bob", "Martin","Kent"]

Distinct/Unique operation returns distinct elements in the collection.

7. Any / Exists / anyMatch

Is any book written by ‘Bob’

//Scala
assert(books.exists(_.author == "Bob"))
//Java
assertTrue(books.stream()
.anyMatch(book -> book.getAuthor().equals("Bob")));
//Groovy
assert books
.any { it.author == "Bob" }

Any or Exists or AnyMatch returns true if any of the elements matches the predicate. The predicate is specified as a lambda expression.

8. ForAll or All or Every

If every book has at least 50 pages

//Scala
assert(books.forall(_.numberOfPages > 50))
//Java
assertTrue(books.stream()
.allMatch(book -> book.getNumberOfPages() > 50));
//Groovy
assert books.every { it.numberOfPages > 50 }

Returns true if all the elements fulfil the specified criteria.

9. Fold or Reduce or Inject

Get comma separated book names as a string

//Scala
assert(books.map(_.name).reduce(_ + ", " + _) == "Clean Code, Refactoring, Extreme Programming, TDD")
assert(books.foldLeft("")((a, b) => a + ", " + b.name)
.replaceFirst(", ", "") == "Clean Code, Refactoring, Extreme Programming, TDD")

//Java
assertThat(books.stream()
.map(Book::getName)
.reduce((a, b) -> a + ", " + b).get(),
is("Clean Code, Refactoring, Extreme Programming, TDD"));
assertThat(books.stream()
.reduce("", (a, b) -> a + ", " + b.getName(),
(a, b) -> a + ", " + b
)
.replaceFirst(", ", ""),
is("Clean Code, Refactoring, Extreme Programming, TDD"));

//Groovy
assert books.collect { it.name }.inject {a, b -> a + ", " + b} == "Clean Code, Refactoring, Extreme Programming, TDD"
assert books.inject("") {a, b -> a + ", " + b.name}
.replaceFirst(', ', '') == "Clean Code, Refactoring, Extreme Programming, TDD"

Looping is nice, but sometimes there are situations where it is necessary to somehow combine or examine every element in a collection, producing a single value as a result. Max, Sum, Average etc. are few examples. Converting a collection into a comma separated string is one of the famous operation, which can be performed using reduce or fold.

There are three forms of reduce operation in Java 8

//With identity and accumulator
T reduce(T identity, BinaryOperator<T> accumulator);
//With only accumulator
Optional<T> reduce(BinaryOperator<T> accumulator);
//With identity, accumulator and combiner
<U> U reduce(U identity,
BiFunction<U, ? super T, U> accumulator,
BinaryOperator<U> combiner);

In out example above we have used 2nd and 3rd form.

So let’s understand why there are 3 different syntax. Before that we must understand the meaning of identity, accumulator and combiner

  1. identity: the identity value for the accumulating function. Identity acts as an initial value for the accumulator function.
  2. accumulator: Accumulator function is evaluated for every element in the stream and the result is stored in a variable which is returned when the reduce operation is over.
T result = identity;
for (T element : this stream)
result = accumulator.apply(result, element)
return result;

3. combiner: combiner is supposed to be used with parallel streams. In parallel streams the accumulator function is evaluated on multiple partial streams. Combiner does the work of combining the result of the partial accumulators.

Now coming back to our problem: Get comma separated book names as a string

The problem is that our collection is a collection of Book objects and not of primitive string type. The first and second form of reduce operation takes an accumulator function which operate on the same object type and return the same object type. So we cannot directly use these forms on a stream of Books, because then accumulator must return a Book object but we want to return a string.

This problem is solved by mapping the object into a stream of string first and then performing the reduce operation with an accumulator which operates on string and returns string.

Hence, as seen in the example, the first way of solving the problem is using a map and a reduce operation with an accumulator which just combines two strings and returns a comma separated combined string.

Why the combiner is required?

As mentioned earlier, combiner is supposed to be used with parallel streams. In parallel streams the accumulator function is evaluated on multiple partial streams. Combiner does the work of combining the result of the partial accumulators. Which means that the reduce operation can be performed more efficiently than writing separate map and reduce functions.

As per the javadoc

Many reductions using this form can be represented more simply
by an explicit combination of map and reduce operations.
The accumulator function acts as a fused mapper and accumulator,
which can sometimes be more efficient than separate mapping and reduction,
such as when knowing the previously reduced value allows you to avoid
some computation.

Note that the accumulator function in 3rd form of reduce takes two parameters of different types. Where the type of the first parameter is same as that of return type or type of an identity. And the type of the second parameter is same as that of the object inside the stream.

<U> U reduce(U identity,
BiFunction<U, ? super T, U> accumulator,
BinaryOperator<U> combiner);

In the example mentioned above U = String and T = Book

The reduce statement when rewritten with type would look like this

books.stream()
.reduce("", (String a, Book b) -> a + ", " + b.getName(), (String a, String b) -> a + ", " + b)
.replaceFirst(", ", "")

Here the accumulator function takes String and Book object and returns a String and the combiner then combines the multiple strings generated by partial accumulator function. Hence the input and return type of combiner function is same as that of return type of accumulator or return type of reduce operation.

Unless you are getting a significant performance gain separate map and reduce should be used as it increases readability.

Scala

Note that in Scala or Groovy the combiner is not used.

Scala has two sets of functions to perform similar operation.

  1. Reduce: Reduce operations in Scala does not take an initial value.

The return type and the input type of both the arguments of the accumulator function for plain reduce are same.

There are reduceLeft and reduceRight operations which take an accumulator with different input type.

  1. Fold: Fold operations in Scala take an initial value. There are foldLeft and foldRight operations as well.

Reduce / Fold is powerful operation which can perform versatile tasks. Reduce / Fold is a huge topic and deserves a separate post. Stay tuned. For examples of reduce in Scala refer: https://oldfashionedsoftware.com/2009/07/30/lots-and-lots-of-foldleft-examples/

10. Sum

Get total number of pages of all books

//Scala
assert(books.map(_.numberOfPages).sum == 850)
//Java
assertThat
(books.stream()
.mapToInt(Book::getNumberOfPages)
.sum(), is(850));
//Groovy
assert books.sum {it.numberOfPages} == 850

In Scala and Java, we need to transform the collection into a collection of numbers first to be able to call sum operation. On the other hand Groovy applies closure function on every element and returns the sum of the resulting value.

11. Max

Which book has maximum number of pages?

//Scala
assert(books.maxBy(_.numberOfPages) == new Book("Refactoring", "Martin", 300))
//Java
assertThat
(books.stream()
.max(Comparator.comparingInt(Book::getNumberOfPages)).get(),
is(new Book("Refactoring", "Martin", 300)));
//Groovy
assert books.max {it.numberOfPages} == new Book("Refactoring", "Martin", 300)

Finding out max number of pages is simple, but finding out a book with max number of pages is little tricky.

Scala has got maxBy method which evaluates a predicate to determine the max object.

In java, a comparator can be passed to max function.

Laziness

Operations on java 8 streams are lazy in nature. The intermediate operations are not evaluated unless a terminal operation exists on the stream.

Let’s have some fun.

Problem: Print names of only 2 authors in capital who have written book of at least 200 pages

books.stream()
.filter(book -> book.getNumberOfPages() >= 200)
.map(Book::getAuthor)
.map(String::toUpperCase)
.forEach(System.out::println);

Looking at the code, it seems like it’s doing a lot of work. One pass of collection for filtering based on number of pages. Then another pass for mapping the author name, another one for converting the author name in to uppercase and then finally another pass to print the values.

Total 4 passes to do the operation. Seems a lot right?

Well let’s peek into the operations, what’s happening. Java 8 has got a special stream operation named peek, which is primarily meant for debugging. It’s like forEach but is an intermediate operation, hence can be introduced anywhere in between.

books.stream()
.filter(book -> book.getNumberOfPages() >= 200)
.peek(e -> System.out.println("Filtered book: " + e))
.map(Book::getAuthor)
.peek(e -> System.out.println("Mapped author: " + e))
.map(String::toUpperCase)
.peek(e -> System.out.println("Mapped to upper case: " + e))
.forEach(System.out::println);
//Output
Filtered book: Refactoring : Martin : 300
Mapped author: Martin
Mapped to upper case: MARTIN
MARTIN
Filtered book: Extreme Programming : Bob : 200
Mapped author: Bob
Mapped to upper case: BOB
BOB
Filtered book: TDD : Kent : 250
Mapped author: Kent
Mapped to upper case: KENT
KENT

Surprised to see the output. It should have printed Filtered book statement for all 3 books, and then mapped author statement for 3 authors and so on…

The stream operations in Java 8 are not evaluated horizontally but are evaluated vertically i.e. each element of the stream moved vertically down the chain. That’s why when a book matched the mentioned filter criteria, it is passed to the next operation in chain and the evaluation of other stream elements is delayed. This allows us to create a stream of infinite elements since all the elements are not loaded at once and are not evaluated for every operation in the chain. While working on bigger data, this might be a performance boon.

Are operations on Scala / Groovy List evaluated lazily?

No. But like Java Scala has got streams. A Stream is like a List, except that its elements are computed lazily.

In fact it’s evident that Java streams (and many other features ;) are clearly inspired from the Scala.

A stream in Scala can be accessed using a toStream method on List

books.toStream.map(_.name).toList

What about Groovy?

No. Groovy also does not support lazy evaluations of operations directly on lists. But hey here is a good news! There is a library called Groovy Stream (http://timyates.github.io/groovy-stream/) which provides a support for lazy iterators. Also Groovy 2.3+ supports JDK 8 and hence Java 8 streams can be easily used in Groovy. In fact most of the Java style code runs perfectly fine in Groovy.

Conclusion

Java stream API is very similar to that of Scala and can perform almost everything which can be done using Scala or Groovy. It’s designed in such a way that the code can be written in more readable and natural way. It reveals it’s intent clearly and is optimised for performance.

It’s good to see that Java 8 has incorporated some amazing features from other functional languages which gives Java developers a comfort to write a functional style code without getting into the complexity of learning a new language or paradigm.

That’s it for now folk! See you soon with an another amazing article.

All the examples in the article can be downloaded from github: https://github.com/sujeet100/StreamTest