Understanding the Power of Streams in Java

masooria
The Startup
Published in
7 min readSep 30, 2020
Going With the Flow

I became very intrigued by the idea of functional programming after I came across this Uncle Bob’s talk in youtube “The Failure of State”. I work in the DevOps Team in an MNC and was eagerly waiting to apply these principles in my day to day work..

We have a big monolithic application which takes customer requests to create service instances and takes care of their life cycle operations. Written in Java using Spring, of course.. It is written in imperative style with methods and functions spanning 500 to 600 lines. I even saw some methods with 1450 lines. But it looked normal for me.. As a novice engineer looking at the production code for the first time, I thought this is how every production grade code looks all over the industry. For every bug we see in production, we used to have sleepless nights, debugging the code. Methods have multiple levels of abstraction.. A for loop inside a for loop.. Inside a for loop. It was very hard to test the code. As there are countless side-effects for the methods, writing unit tests was very challenging and sometimes we are not even sure what all the things a particular method is responsible for. And changes keep coming every week adding more conditional branches and loops.

But when I saw people talking about Functional programming, Pure Functions and how it makes using the debugger obsolete, I naturally felt very intrigued. I was really inspired by the articles by Eric Elliot and Brian Lonsdorf. After reading the book “Composing Software” by Eric Elliot. I took one particular module of the application which is responsible for updates of the customer and tried to rewrite it in ‘node’ as a micro service. That was really a beautiful journey applying all the things I learnt in the code I write. Wrote the whole node application without a single assignment operation, no imperative structures and no shared mutations. It took one whole week for me to understand, apply and test the whole application.

But when I went to the management with this idea, it was immediately shot down. Reason ? We are not sure about what will happen if we change any of the working code. The team is terrified to touch or refactor any part of the working code. Reason ? It was not developed using TDD. No unit tests for the methods. This is when I was told the mantras of the software Industry “If it is working, Don’t touch it” and “If it is stupid but works, It ain’t Stupid”. But it bothered me so much. Moreover I was told we could not just switch over to Javascript (node) for one particular piece of module as it would be hard for maintenance, I was asked, if I could refactor it in JAVA !

“Well”, I thought this is not going to happen, “Java I know is an Object Oriented + Imperative language”, (I am not a Java Developer. Only Java I know is from academics) I thought I could never bring the beauty of pure functions and immutable data structures into Java. This is until I came across the presentations given by the wonderful gentleman “Venkat Subramaniam” on youtube. Mainly the presentation on Lambdas and Collectors.

https://www.youtube.com/watch?v=1OpAgZvYXLQ&ab_channel=Devoxx

After watching almost all the presentations given by him on youtube. I felt very comfortable in rewriting the whole thing functionally in Java (Sadly we could not use java11. Had to settle with Java8). As the data we are dealing with is JSON, I only used one third party package org.json written by Douglas Crockford himself.

I just want to give the glimpse of my journey from Imperative Java to Functional Java using Streams. (Well I can’t say that Lambdas or Collectors make Java functional in a purist sense. But at least make Java code look much cleaner and understandable)

Let me take one part of the logic. Below is the structure of the JSON input we get from an API.

{
"root": [
{
"id": "6301",
"someBoolean": true,
"Resources": [
{
"name": "Credit",
"value": 0,
"startFrom": "2020-08-10",
"endOn": "2023-08-09"
}
]
},
{
"id": "6302",
"someBoolean": true,
"Resources": [
{
"name": "DB_CALLS",
"value": 1000,
"startFrom": "2022-08-10",
"endOn": "2023-08-09"
}
]
},
{
"id": "6303",
"someBoolean": false,
"Resources": [
{
"name": "DB_CALLS",
"value": 800,
"startFrom": "2020-08-10",
"endOn": "2021-08-09"
},
{
"name": "Store_Views",
"value": 1000,
"startFrom": "2020-08-10",
"endOn": "2021-08-09"
},
{
"name": "API_CALLS",
"value": 2000,
"startFrom": "2020-08-10",
"endOn": "2021-08-09"
}
]
},
{
"id": "6304",
"someBoolean": true,
"Resources": [
{
"name": "DB_CALLS",
"value": 600,
"startFrom": "2021-01-10",
"endOn": "2022-03-09"
},
{
"name": "API_CALLS",
"value": 3000,
"startFrom": "2021-01-10",
"endOn": "2022-03-09"
}
]
}
....
]
}

Well this is not a very complex structure but has one level of nested Json Structure. Now given this “root” JsonArray our method “pickResources()” should give out a List of all JsonObjects inside the “Resources” JsonArrays. But with having the “id” field of the particular record added to each JsonObjects inside the Resources array. Also we should ignore the records which have the name “Credit”. The output should look like:

[
{
"startFrom": "2022-08-10",
"name": "DB_CALLS",
"endOn": "2023-08-09",
"id": "6302",
"value": 1000
},
{
"startFrom": "2020-08-10",
"name": "DB_CALLS",
"endOn": "2021-08-09",
"id": "6303",
"value": 800
},
{
"startFrom": "2020-08-10",
"name": "Store_Views",
"endOn": "2021-08-09",
"id": "6303",
"value": 1000
},
{
"startFrom": "2020-08-10",
"name": "API_CALLS",
"endOn": "2021-08-09",
"id": "6303",
"value": 2000
},
{
"startFrom": "2021-01-10",
"name": "DB_CALLS",
"endOn": "2022-03-09",
"id": "6304",
"value": 600
},
{
"startFrom": "2021-01-10",
"name": "API_CALLS",
"endOn": "2022-03-09",
"id": "6304",
"value": 3000
}
]

The imperative Solution we have to this is:

This works.. I know people could even say this is simple. (Paraphrasing Venkat Subramaniam here) I would say this is familiar not simple. Both are not same. A for loop inside a conditional branch which is inside another for loop. It is not easy to see what this is doing in a single glimpse.

As Martin Fowler famously says

“Any fool can write code that a computer can understand. Good programmers write code that humans can understand.”

So I wanted to refactor the above code little bit using Java8 Streams.

http://memegenerator.net/instance/61059091/finding-nemo-birds-stream-stream

Just using the basic map, filter and foreach, the refactored looked like below:

The toStream() is a custom method to convert a JsonArray (org.json) to a Java8 Stream with the help of java.util.stream.StreamSupport

This already looks clean (or is it not ?). Well the number of lines reduced. Used the cool Lambdas. I was satisfied with this initially.

But There are few problems with the above piece of code.

  • First thing which is obvious is this is just old wine in a new bottle.. It still has an imperative structure. We still tell what to do and how to do.
  • lambdas are more than one line. (this is not recommended. Look here to know why )
  • We have created few garbage (throw away) variables inside the lambdas.
    JSONArray Resources and String id .
  • But the main problem with the code is.. This violates the one thing Functional Programming stands for. It mutates a shared Data Structure.

We declared a records List and the lambda adds to the records List.

List<JSONObject> records = new ArrayList<>();
...
...
forEach(records::add);

This could result in bugs which are extremely hard to find and debug if in the toStream method we use the option true, to make the stream parallelizable.
Like below:

static Stream<JSONObject> toStream(JSONArray arr) {
return StreamSupport.stream(arr.spliterator(), true) // <=
.map(JSONObject.class::cast);
}

When a stream is enabled to run in parallel. There could be race condition between threads to add data into the array and we could potentially loose some data. Never Mutate Shared Data Structure.

So the above code is no improvement to the imperative one. I had to truly use the power of collectors in java8 to cope up with this issue.

The code I ended up with looked like below:

This does not have any throwaway variables, No shared mutable Data structures. Just one big expression returned out of the method. Sigh !

What is it doing ?

  • toStream in the line 2 gets the JsonArray, converts it to a stream
  • flatmap -> which first apply the map function and then flattens the structure. In this case given the Stream of Json Records by line 2 line 3 gets the JsonArray “Resources” for each record converts it into another stream.
    By this time we have a stream<stream<JSONObject>> So the flatmap converts (flattens) it to stream<JSONObject> (java flatmap only works or only flattens nested streams not just any iterable)
  • next we apply the filter to remove any records with the name “credit”
  • we put the id field into the nested record. Note we are not sharing any DS. Stream takes care of creating new record each time. So this can be happily parallelised.
  • Then we collect the each record into a list.

If you are new to streams API in java this could seem a bit weird and even hard to understand. But once we play around with the lambdas and collectors this looks very fluent and pleasant.

I know I just scratched the surface of stream and collectors API in the above example. I will try to write how I used other awesome collectors like grouping by in my next post.

Thats it for now. Thanks for reading. Let me know if there are any issues in the code samples used above.

--

--