How to say more, while writing less!
Let me walk you through a real problem solved at Booking.com, which will be a good background to show the original Java code improving, getting smaller and more expressive as it gets more functional. The knowledge shown here should be mostly language agnostic, since most languages have some type of FP (Functional Programming) support. By the end, I hope to stoke your curiosity to learn more, including how FP can help to write cleaner and better code.
A brief description of the problem being solved, while not the focus of the article, understanding it will help with the enjoyment. A concrete example is used to make the presented concepts less abstract and clearer.
What was our challenge:
Prices have a start and end day, representing a day epoch and are a closed interval. In the database, the prices are always stored with the biggest possible interval that have the same properties to save storage. When displayed back to the user, we want the prices to be split in a way that we can group into date intervals that don’t overlap with each other.
Please check the spreadsheet below for prices 1 to 4. The numbers on top are the epochs and can be converted to calendar dates.
A correct but suboptimal way would be grouping the prices per day:
But we want to group in the most optimal way, in a way that we have the least amount of non overlapping interval groups, as shown in the first output.
In the UI is shown something like this (let’s say epoch 1 is 01–01–2021):
From 01–01–2021 to 02–01–2021
From 03–01–2021 to 06–01–2021
Price 1, Price 4, Price 5
For example, Price 1 dates are stored as a single entry in the DB with a range of 01–01–2021 to 14–01–2021, but it is split to fit the top level grouping of dates.
The whiteboard above shows the initial state of the problem.
The input is a list of prices, with the end date as an exclusive date (half open interval). We used in the example the ones in the first table. From it, we build 3 data structures:
The start map uses the prices start day as key, and all prices that start at that day as values.
The end map uses the prices end day (exclusive) as key, and all prices that end at that day as values.
The sorted events list contains all the days that an event (a price starting or ending) happens. The values are just the union of the start and end map keys.
For every day something happens (event) we:
- Copy all active prices to the output. Replacing the start date to the previous event date, and the end date to the current date.
- Remove all prices ending on the event date from the active set.
- Add all prices starting on the event date to the active set.
After all events are processed, the output is returned, without any overlap between groups.
While not required for continuing the article, there is a quick step by step whiteboard run of the algorithm for the example input on the article Appendix A (end of article).
Let’s start by defining the signature and the first 2 maps, we convert the Price to a half open interval:
The first thing to notice is that the code looks awfully similar. But how to reuse it? There is one extra issue, prices are not defined on this project, they are provided in a common shared library, which are java classes generated from a protobuf definition and cannot be modified.
There is also a second problem, we need two distinct ways to extract the key from price, so even if you could modify the Price class, and make it implement an interface like KeyExtrable that gives you a getKey method, which key would it give, the start or the end epoch?
A standard object oriented approach is clearly not ideal.
What we want is to pass the behaviour of extracting a price key, while not changing the passed prices. Lets try again:
Much better, using the power of a lambda function we can reuse the code with almost no code overhead and without creating new object instances. But why should this code be about prices? Given any iterable, if I provide a keyExtractor, it can be grouped. Let’s write groupByKey again:
Without increasing a single line of code, the functional approach allows this code to be fully reused, and could be moved without problem to any utility class / library. No changes need to be made to the methods callers.
Can we improve on it? Of course we can!
A Map allows us to provide a function to compute a missing value. Meaning, it will run a function to compute the value of a missing key, providing also laziness, an ArrayList will only be created if the key is missing. The code can be further simplified:
I came this far to show how more expressive and composable a code can be in a functional style. Java actually provided us out of the box the stream package, which uses a functional style to deal with collections. groupBy can be written with one line of code, so we don’t need to reinvent the wheel. Let’s also define the eventsDates list, a sorted union of both maps keys that also benefits from the stream syntax:
Using a functional style, the algorithm shown in the whiteboard (Appendix A) can be solved in 15 lines of code (Appendix C), without the need of a single if statement. A flatter code usually is easier to reason about, without requiring thinking about the possible paths. At the end of the article, there is also the full Old School implementation in Appendix B with its glorious 45 lines of code.
It has ⅓ of the size of its O.O. counterpart and is much more expressive. This is a functional style code, but this is not true functional code, and side effects leak everywhere. I will try to go back to explore this in more detail in a next article, while showing how a true functional language solves all of these problems.
A brief introduction to Typeclasses:
We have not needed to generify this code yet in production, but still, we are still going to propose a functional solution to accomplish it, using a typeclass. Complementing the reasoning of providing a behavior, instead of extending a class, to reuse code. The asDisjointPrices code should work with anything that represents a Half Closed Interval.
A typeclass (or at least its Java analog) is just a way to compose these behaviors without ever having to extend Price itself. Its instances should give us a group of stateless functions that provide those behaviors, and can be defined in any module. We just need a single instance of it for the class we are defining the behavior (Price). This is my interpretation of how to make them in Java. It should be self-explanatory what is being achieved from the article itself, but for more details see: http://learnyouahaskell.com/types-and-typeclasses/ .
It is worth noting that we don’t care at all for the start and end of the interval to be integers, it could be anything that can be compared. The intervals could be represented by Long, unbounded BigInteger, an enum of weekdays, etc… We are going to generify this as well.
A half closed interval definition:
And a Price instance for it:
And asDisjointPrices is not about prices anymore, so we can change its signature to:
It does not care about what type I (interval) is, but only if we have an instance of something that can give the behavior of a half closed interval.
Wait, we are forgetting something very important, we must guarantee that my type “A” is comparable. Unfortunately Java will not give me the power to express this easily. I don’t want to resort to OO to express that (A extends comparable), I want to say that someone can give my type A the behavior of being compared, the same way I provide the behavior of price of being a half closed interval. We define the new interface signature:
and implement as:
The new generic method, that deals with intervals, is kept the same size, the beginning of it can be seen below and a full implementation is on Appendix D.
Weak Points of the Java solution, before we move on let’s address some pitfalls of the code:
Some methods require diving into the Javadoc. Getting a keyset from a map is backed by the map itself, ( startMap.keySet() ) that is why I cannot copy one keyset into another to create my event list. This cannot be inferred only by the method signature.
Every event index interaction produces side effects outside of it, and the active set exists in an intermediary state, coping the intervals from it needs to come first, because the next operations are destructive.
The setters of Price have the nice convenience of returning the instance after setting a value,
so many updates can be chained into the same line. While convenient, the signature now is misleading, since it is a side effect only operation.
As shown above, mutability almost always makes the reasoning about method signature worse, requiring inspection / reading of what is actually being done. My implementation uses the copy method that will make a deep copy of Price (luckily it’s not a very big object). If Price was immutable, a deep copy would never be required, since all unchanged properties could always be reused. Another possible optimisation is that classes with the same values could be internally a reference to the same instance. Integer class does something along these lines to reuse instances, see Appendix E.
Just as a thought exercise, consider a world where Integer is indeed mutable, and a method receiving an Integer can mutate its value.
What is Next?
There is a follow up to this article, it solves the same problem in a pure functional language (Haskell), and shows how to tackle each of the mentioned weak points. You could always try to write pure functional code in Java, but the language is clearly not meant for that, and requires a lot of work.
A | Whiteboard resolution:
B | Java Old School:
C | Java Functional Solution:
D | Java with Typeclass:
E | Instance Reuse of Integer: