Know Thy flatMap
Job titles morph constantly. As in the medical community, where the focus may be broader or more specialized, some of us who were once just programmers are now filling other job titles. One of the newest specialized disciplines is data engineer. The data engineer shepherds in the data, building pipelines, filtering data, transforming it, and molding it into what they or others need to make real-time business decisions with stream processing.
Both the general programmer and data engineer must master the flatMap
, one of the most important tools for any functional capable language like our beloved Java, but also for big data frameworks and streaming libraries. flatMap
, like its partners map
and filter
, is applicable for anything that is a “container of something” — for example, Stream<T>
and CompletableFuture<T>
. If you want to look beyond the standard library, there is also Observable<T>
(RXJava) and Flux<T>
(Project Reactor).
In Java, we will use Stream<T>
. The idea for map
is simple: take all elements of a stream or collection and apply a function to it:
Stream.of(1, 2, 3, 4).map(x -> x * 2).collect(Collectors.toList())
This produces:
[2, 4, 6, 8]
What happens if we do the following?
Stream.of(1, 2, 3, 4)
.map(x -> Stream.of(-x, x, x + 1))
.collect(Collectors.toList())
Unfortunately, we get a List
of Stream
pipelines:
[java.util.stream.ReferencePipeline$Head@3532ec19, java.util.stream.ReferencePipeline$Head@68c4039c, java.util.stream.ReferencePipeline$Head@ae45eb6, java.util.stream.ReferencePipeline$Head@59f99ea]
But, thinking about it, of course for every element of the Stream
we’re creating another Stream
. And take a deeper look in the map(x -> Stream.of(...))
. For every singular element, we’re creating a plural. If you perform a map
with a plural, it’s time to break out the flatMap:
Stream.of(1, 2, 3, 4)
.flatMap(x -> Stream.of(-x, x, x+1))
.collect(Collectors.toList())
That will produce what we were aiming for:
[-1, 1, 2, -2, 2, 3, -3, 3, 4, -4, 4, 5]
The opportunities for using flatMap
are immense.
Let’s move on to something more challenging that is apt for any functional programming or data engineering task. Consider the following relationship, where getters, setters, and toString
are elided:
class Employee {
private String firstName;
private String lastName;
private Integer yearlySalary; // getters, setters, toString
}class Manager extends Employee {
private List<Employee> employeeList; // getters, setters, toString
}
Suppose we are given only a Stream<Manager>
and our goal is to determine all the salaries of all employees, including Manager
s and their Employee
s. We might be tempted to jump right to the forEach
and start digging through those salaries. This, unfortunately, would model our code to the structure of the data and would cause needless complexity. A better solution would to go the opposite way and structure the data to that of our code. That is whereflatMap
comes in.
List.of(manager1, manager2).stream().flatMap(m ->
Stream.concat(m.getEmployeeList().stream(), Stream.of(m)))
.distinct()
.map(Employee::getYearlySalary)
.mapToInt(i -> i)
.sum();
This code takes every manager and returns a plural — the manager and their employees. We then flatMap
these collections to make one Stream
. We then perform a distinct
to filter out all duplicates. Now we can treat them all as one collection. The rest is easy: Perform a map
that extracts their yearlySalary
. Next is a Java-specific call, mapToInt
which turns a Stream<Integer>
into an IntStream
, a specializedStream
type for integers. Finally, we sum the stream. Concise code.
Whether you use Stream
or another kind ofC<T>
, where C
is any stream or collection, keep processing your data using map
, filter
, flatMap
, or groupBy
before reaching for the forEach
or any other terminal operation like collect
. If you go with the terminal operation prematurely, you’ll lose any laziness and optimization that Java Streams, streaming libraries, or big data frameworks grant you.