Java Streams

Purnima Kamath
May 21, 2018 · 5 min read

Modern data engineering technologies, like Kafka or Spark, rely on the concept of streaming data for processing. Streams syntactically follow a functional programming paradigm and a great way to get a head-start on grasping the concept is by using Java Streams. Java 8 came with updates to fit the concept of functional programming in an object oriented world. Lambda expressions and Functional Interfaces make it easy to represent complex behaviour through easily readable code.

Lambda expressions & Functional Interfaces

Lambda expressions & Functional Interfaces are an integral part of writing code using streams. In a nutshell, functional interfaces (in the java.util.function package) are Java’s way of adapting to the functional programming paradigm. They provide reference types for lambda expressions. Easiest way to think about lambda expressions is to think of inline code snippets. They reduce the overall amount of code you need to write, especially implementing anonymous inner classes.

There are four major “categories” or “shapes” of functional interfaces:

The table above means, if you spot a method in an api which accepts, say a Predicate, it means you can write an inline lambda expression which accepts one argument and returns a boolean. For e.g. the strings.removeIf() below.

ArrayList.removeIf() as per the api accepts a Predicate. This means, it will accept either a method reference or an inline lambda expression. This expression should consist of an inline method which accepts an argument and a return statement as shown below.

The inline lambda expressions for the rest of the functional interfaces look similar. There are various other flavours of Function, Predicate, Consumer and Supplier, each catering to various return types and arguments.

Streams

Whenever you hear “streams”, think — Filter; Map; Collect.

A stream is a sequence of elements. You can conditionally filter elements on a stream, process/enrich an element using map and collect (or reduce)the stream to use in other parts of your program. A stream builder helps build streams. Java API also provides many convenient methods to create & operate on streams.

  • Streams are computed on-demand and do not really store anything (unlike collections)
  • Streams are lazy! Nothing gets computed until a terminal operation is called
  • Streams cannot be reused. Reinitialise after calling a terminal operation to reuse.
  • Streams support both sequential and parallel processing

Java introduced java.util.stream package for stream development. They provide apis for each stage of stream creation & operation. Every stream goes through three stages —

  1. Build: To create a new stream
  2. Operate: Intermediate operations like filtering
  3. Terminate: Actions which indicate the stream to start executing operations and provide an output

Building streams

Streams can be built from Collections, from files or using StreamBuilder. The java.io APIs have been modified to return streams while reading from files (instead of byte arrays say) as in the example below. Assuming there exists a file called “fruits.txt” on the class path —

bufferedReader.lines() returns a Stream<String>. The code snippet above converts this to a Stream<Fruit> using Stream.Builder.

Few other cool ways to create streams shown below. I find the range method on numeric streams very useful (like Python)

Operating on streams

Whenever you think of streams, think Filter;Map;Collect (or Filter;Map;Reduce if you like). Operations are run on streams after they are created. Operations can be chained and get executed in the order. We can perform two types of operations on streams — intermediate operations & terminal operations. Intermediate operations, like filter() or map(), help during processing (e.g. massaging or enriching the elements of a stream). All intermediate operations are lazy. They get executed only when the terminal operations are called. Terminal operations, like forEach() or collect(), help providing an output (e.g. prints or i/o).

The intermediate operations in the examples above are filter(), map() & flatMap(). Few points worth noting —

  • filter() accepts a Predicate, as in the Functional Interfaces section above and must return a boolean.
  • map() & flatMap() accept a Function
  • flatMap() appends a 🍺 (because fruit beer is a thing!) to the output list.
  • The difference between map() and flatMap() is map cannot change the output stream format. The output stream’s size and type stays the same as the input. FlatMap, on the other hand can change the size and type of the output. Conceptually, think of map as ‘list of lists’ and flatMap as flattened list of all the elements in the ‘list of lists’.

The terminal operations in the example above are forEach() & collect(). Also few points worth noting on terminal operations —

  • forEach() used a ‘Method Reference’ to print to the console, using the ‘::’ operator
  • Java provides utilities to create Collectors used in conjunction with collect(). Collectors transform the stream into java.util.Collection objects.

You can execute the code above by cloning the GitHub project at https://github.com/pkamath2/streamstress. Also, look at the FruitColorPrinter which prints to bash in ANSI colour! Plus, we got 🍺 on bash!

I hope you have found this introduction to Streams useful (it just scratches the surface of the topic). Reading javadocs (especially package summaries) has always helped me gain a better understanding. Please comment below if you see any issues with the content and also connect via GitHub to indicate errors in the code.

Purnima Kamath

Written by

Techie, Painter in Oils 🎨, Director @WWCodeSingapore, Evangelist @yow_conf