Know Thy flatMap

Daniel Hinojosa
97 Things
Published in
3 min readJun 11, 2019

--

Job titles morph constantly. As in the medical community, where the focus may be broader or more specialized, some of us who were once just programmers are now filling other job titles. One of the newest specialized disciplines is data engineer. The data engineer shepherds in the data, building pipelines, filtering data, transforming it, and molding it into what they or others need to make real-time business decisions with stream processing.

Both the general programmer and data engineer must master the flatMap, one of the most important tools for any functional capable language like our beloved Java, but also for big data frameworks and streaming libraries. flatMap, like its partners map and filter, is applicable for anything that is a “container of something” — for example, Stream<T> and CompletableFuture<T>. If you want to look beyond the standard library, there is also Observable<T> (RXJava) and Flux<T> (Project Reactor).

In Java, we will use Stream<T>. The idea for map is simple: take all elements of a stream or collection and apply a function to it:


Stream.of(1, 2, 3, 4).map(x -> x * 2).collect(Collectors.toList())

This produces:

[2, 4, 6, 8]

What happens if we do the following?

Stream.of(1, 2, 3, 4)
.map(x -> Stream.of(-x, x, x + 1))
.collect(Collectors.toList())

Unfortunately, we get a List of Stream pipelines:

[java.util.stream.ReferencePipeline$Head@3532ec19, java.util.stream.ReferencePipeline$Head@68c4039c, java.util.stream.ReferencePipeline$Head@ae45eb6, java.util.stream.ReferencePipeline$Head@59f99ea]

But, thinking about it, of course for every element of the Stream we’re creating another Stream. And take a deeper look in the map(x -> Stream.of(...)). For every singular element, we’re creating a plural. If you perform a map with a plural, it’s time to break out the flatMap:

Stream.of(1, 2, 3, 4)
.flatMap(x -> Stream.of(-x, x, x+1))
.collect(Collectors.toList())

That will produce what we were aiming for:

[-1, 1, 2, -2, 2, 3, -3, 3, 4, -4, 4, 5]

The opportunities for using flatMap are immense.

Let’s move on to something more challenging that is apt for any functional programming or data engineering task. Consider the following relationship, where getters, setters, and toString are elided:

class Employee {
private String firstName;
private String lastName;
private Integer yearlySalary;
// getters, setters, toString
}
class Manager extends Employee {
private List<Employee> employeeList;
// getters, setters, toString
}

Suppose we are given only a Stream<Manager> and our goal is to determine all the salaries of all employees, including Managers and their Employees. We might be tempted to jump right to the forEach and start digging through those salaries. This, unfortunately, would model our code to the structure of the data and would cause needless complexity. A better solution would to go the opposite way and structure the data to that of our code. That is whereflatMap comes in.

List.of(manager1, manager2).stream().flatMap(m ->
Stream.concat(m.getEmployeeList().stream(), Stream.of(m)))
.distinct()
.map(Employee::getYearlySalary)
.mapToInt(i -> i)
.sum();

This code takes every manager and returns a plural — the manager and their employees. We then flatMap these collections to make one Stream. We then perform a distinct to filter out all duplicates. Now we can treat them all as one collection. The rest is easy: Perform a map that extracts their yearlySalary. Next is a Java-specific call, mapToIntwhich turns a Stream<Integer> into an IntStream, a specializedStreamtype for integers. Finally, we sum the stream. Concise code.

Whether you use Stream or another kind ofC<T>, where C is any stream or collection, keep processing your data using map, filter, flatMap, or groupBy before reaching for the forEach or any other terminal operation like collect. If you go with the terminal operation prematurely, you’ll lose any laziness and optimization that Java Streams, streaming libraries, or big data frameworks grant you.

--

--

Daniel Hinojosa
97 Things

Daniel Hinojosa is a programmer, consultant, instructor, speaker, and author.