Functional Java: Collectors

In my last post on Functional Java: Streams, I left this challenge (slight edit for improved wording):

You are given a finite input stream that you need to parse and return a collections of Foo objects. If you see a line of all equal signs, you’re to create a new Foo object. For all other lines, append that to Foo’s output string field.

Let’s start by defining the Foo class and give a simple example input:

data class Foo {val output: String}
# example input below:
==================
some example output
like this
====================
new object
is created here
========
one final object
# result:
3 Foo instances with output:
"some example output like this"
"new object is created here"
"one final object"

I’ll show how a functional solution with code written in Kotlin using the Collector API in Java 8.

Procedural Solution

But first let’s show the procedural solution so we have something to compare against:

fun main(args: Array<String>) {
println(parseFile(File("src/input.txt").inputStream()))
}

fun parseFile(input: InputStream): List<Foo> {
val result = ArrayList<Foo>()
// convert inputstream to a list of lines
val lines = BufferedReader(InputStreamReader(input)).readLines()

var stringBuilder = StringJoiner(" ")
for (line in lines) { // iterate through the list
if (createNewObject(line)) {
if (stringBuilder.length() > 0) {
result.add(Foo(stringBuilder.toString()))
}
stringBuilder = StringJoiner(" ")
} else {
stringBuilder.add(line)
}

}
if (stringBuilder.length() > 0) {
result.add(Foo(stringBuilder.toString()))
}
return result
}
fun createNewObject(line: String): Boolean {
for (c in line) {
if (c != '=') {
return false
}
}
return true
}

In the parseFile method, we read through the InputStream line by line. Storing output lines in a StringJoiner (more useful than StringBuilder in this case by making it easier to add padding between lines), and creating a new foo object when we encounter a line of all ‘=’ signs. When we create a new object we also reset the StringJoiner to allow it to hold strings for the next Foo object..

For simple logic like this it’s fairly readable. However you can imagine if the business logic we need is more complicated like tracking several state changes based on the lines we’ve encountered. Then the main loop of the method can get complicated fast.

A programmer reading through the code will have to keep track of what state the method is in. This solution can quickly increase in length and complexity without obvious places split into smaller self-contained pieces.

Functional Solution: Collector

Let’s compare with a functional solution using Collectors. The Java Collectors class contains built-in collectors you can use for a variety of situations. The documentation includes examples for collecting a stream into a List or more complicated uses like returning a Map through the use of a groupingBy collector:

// Group employees by department
Map<Department, List<Employee>> byDept
= employees.stream().collect(Collectors.groupingBy(Employee::getDepartment));

Our case is too complicated to be handle by an existing collector so we’d need to write a custom collector that implements the Collector interface. This requires we implement 5 methods that uses 3 generics:

public interface Collector<T, A, R> {

Supplier<A> supplier();

BiConsumer<A, T> accumulator();

BinaryOperator<A> combiner();

Function<A, R> finisher();

Set<Characteristics> characteristics();
}

The 3 generics (T, A, R) can be a bit confusing on first glance but let’s go through the methods one by one and see how they’re used.

supplier needs to return a supplier of type A. Where A is defined to be “the mutable accumulation type of the reduction operation (often hidden as an implementation detail)” We’re reading lines and need to group lines of the file into a list of lines for each Foo object.

So the result would be a List<List<String>>. However the supplier needs to return a mutable version of the accumulation type because the collector will be adding to the variable as it processes each line.

So we need a supplier of MutableList<MutableList<String>>, thus we override Supplier like this:

override fun supplier() = Supplier<MutableList<MutableList<String>>> { ArrayList() }

combiner combines 2 results together. Each result is a list (of lists of strings) so combining them is adding 2 lists together.

override fun combiner() = BinaryOperator<MutableList<MutableList<String>>> { t, u ->
t.apply {
addAll(u)
}
}

accumulator takes a value (in our case a string) and folds it into a mutable result container (a mutable list of mutable list of string). This method holds our business logic and is most similar to the procedural code. The logic is:

If the line is empty, ignore

If the line is all ‘=’, create a new List to hold lines for the next Foo object

otherwise, append the line to the last list of string on our results container.

override fun accumulator() = BiConsumer<MutableList<MutableList<String>>, String> { stringGroups, line ->
// looks for a specific line, creates a new MutableList<String> and append it to the List of Lists, otherwise add non-empty lines to the last List of strings
if (line.isEmpty()) return@BiConsumer
if (line.all { it == '=' }) {
stringGroups.add(ArrayList())
} else {
stringGroups.last().add(line)
}
}

Almost there, we have 2 methods left and they’re related: characteristics controls if finisher is invoked.

characteristics there are 3 values you can return under Collector.Characteristics.

IDENTITY_FINISH means the collector’s mutable container and the result type are the same (so A and R are same). In this case the finisher would be the identity function (no-op)

The other 2 values signals the behavior of the finisher method. In our case we return CONCURRENT because our result has order.

Our finisher method then converts from our accumulation data type (A: MutableList<MustableList<Foo>>) into the result data type (R: List<Foo>):

override fun characteristics() = setOf(Collector.Characteristics.CONCURRENT)
override fun finisher(): Function<MutableList<MutableList<String>>, List<Foo>> {
return Function { input -> input.map { parsedLines -> Foo(parsedLines.joinToString(" ")) } }
}

The final code:

fun parseFilesFunc(input: InputStream): List<Foo> {
val lines = BufferedReader(InputStreamReader(input)).lines()
return lines.collect(FooCollector)
}

object FooCollector : Collector<String, MutableList<MutableList<String>>, List<Foo>> {
override fun accumulator() = BiConsumer<MutableList<MutableList<String>>, String> { stringGroups, line ->
// looks for a specific line, creates a new MutableList<String> and append it to the List of Lists, otherwise add non-empty lines to the last List of strings
if (line.isEmpty()) return@BiConsumer
if (line.all { it == '=' }) {
stringGroups.add(ArrayList())
} else {
stringGroups.last().add(line)
}

}

override fun combiner() = BinaryOperator<MutableList<MutableList<String>>> { t, u ->
t.apply {
addAll(u)
}
}

override fun characteristics() = setOf(Collector.Characteristics.CONCURRENT)

override fun supplier() = Supplier<MutableList<MutableList<String>>> { ArrayList() }
override fun finisher(): Function<MutableList<MutableList<String>>, List<Foo>> {
return Function { input -> input.map { parsedLines -> Foo(parsedLines.joinToString(" ")) } }
}
}

The parseFilesFunc method is just reading the input stream and returning the result of the collector.

The business logic is isolated to accumulator method on how to handle the lines of input and organize them into individual lists to be turned into Foo instances by the Finisher.

The other 2 methods combiner and supplier are used to get a new accumulator object or combine results together.

So by creating a custom collector. We can improve our code by leveraging the collector API to write a few small methods that each server a specific purpose. Another programmer familiar with the API can easily understand the code.

There is little state to track and API limits mutability and side-effects. The collector API is very powerful. It allows you to do a lot of reduction that doesn’t seem possible at first glance using only map, filter, reduce.

Hope you found this article interesting and learned something new. All the code can be found in an executable format at this gist. If you have any feedback, you can find me on twitter: @KelvinHMa

Like what you read? Give Kelvin Ma a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.