Using Applicative Style to Increase Compositionality — Part II
In the first part of this series we looked at how to compose different reductions over a single report to efficiently calculate multiple summaries of a report in a single pass. In this blog post we continue with the same main strategy, this time articulating a very explicit applicative style to pull structured data from a relatively flat format. In this way, we achieve flexible, maintainable, and composable code to wrap a fairly traditional Java library API.
Study Two: Composable Excel Sheet Processing
Let’s suppose we want — for whatever reason — to parse an Excel spreadsheet. On the JVM there are options of libraries that provide an API allowing us to access the contents of Excel spreadsheets but many of them are, we might say, venerable.
Their APIs are traditional, by and large; sometimes they initialize a fair amount of state, and they handle errors via exceptions. It can be painful to integrate such code into a more functional code base without some effort. Using Scala’s type system we can apply similar ideas as our previous post in a rigorous way to get a flexible and safe API.
Let’s approach this study in a similar fashion as to our last one. Here’s some example code that is meant to parse a single row using Apache POI, one example of an Excel integration library for the JVM:
We notice immediately that, just like our last study, this code certainly works, but it manages to combine walking the iterator with other concerns. More importantly, it has some other troubles. This code is wildly unsafe, in the sense that getSheet
, getCell
, toInt
, and toDouble
can all throw exceptions! Since functional programs don’t make a habit of catching and handling exceptions with the traditional machinery, this leads to one bad record knocking over our entire application. Thankfully, in Scala we have tools for handling these issues.
Let’s repeat our procedure from the last post of focusing in closely on one element and attempting to generalize from there.
Here’s a function that gets the ID field from a row:
As noted before, this is both unsafe and not obviously composable.
We could do this, but it has a few downsides:
It’s not quite clear how this is meant to compose with other decoders, and we have to force some value to model failure. The type of this function also does not reflect it’s error handling! We could use a type like Option
or Either
and manually return from the branches, but we can do something more idiomatic.
In order to handle the errors, we can use the Try
abstraction that captures errors and then convert to Either
so we can access those errors:
Looks good. Entirely safe for our program. But again, how will we combine these? Afterall, we’ll want to get more than one cell from each row!
Let’s take a step back. We need a function of type Row => Either[Throwable, A]
in order to parse a value of type A
from a row.
Let’s capture that in an abstract notion, same as we did in the last post:
Now we have a named notion of what it is to decode something out of a row of an Excel spreadsheet.
What we want is a function with the following signature:
If you’re familiar with functional programming, it immediately occurs to you that we’re talking about monoidal functors again (or applicative functors).
Let’s look at what we need to implement to get a monoidal functor for our RowDecoder
type:
- A way to make a decoder that just always returns the value we ask of it (
pure)
. - A way to “post process” the result of decoding something from a row (
map
). - A way to combine two decoders into one decoder that returns both values (
product
).
How can we do that? Well, this will work:
Now we can use it to combine our decoders! Monoidal
let’s us define some goodies. If the definitions are opaque, do not worry, they are generally provided in libraries and one just pays attention to what they do:
What do these functions mean in terms of our use case?map2
let's us process the output of two different decoders, failing if either does.map3
let’s us process the output of three different decoders after all of them produce, failing only when any one of them does.
Note that if a bad row is passed in it fails on the first decoding step to go wrong, and returns that error, going from left to right essentially. We could collect all errors using slightly different data structures, but for now let’s just sit with keeping only the first thing that goes wrong.
But now we have exactly what we needed in order to parse each row into our record type! Let’s see what this new API looks like:
Et voila! We have ourselves a clean, compositional, and safe procedure for pulling data from an Excel spreadsheet with very clear division of responsibilities. We’ve already gone a nice distance towards maintainable and safe code!
On the other hand, things always change. Imagine our Project Manager passes us the new requirement that we need to parse nextPaymentDate
as well, so we enrich our models:
No problem, right? Now we just make a map4
following the same pattern as map2
and map3
. But wait, looking at the file we realize that there are multiple date formats for the next payment date field! Both “yyyy/mm/dd” and “yyyy-mm-dd” at the very least.
Oh well, it’s the rare set of documents that never varies from a fixed schema. We don’t yet have any way to combine our decoders to capture choices between multiple possible strategies or shapes. Are we out of luck?
Let’s contemplate the problem in front of us:
We need a way to say “or” for decoders. How hard can it be?
It turns out if we extend the notions we’ve been working with a bit, we have just enough power to do this.
We have two new functions to write:
empty
, which takes any type and gives us a decoder of that typecombineK
, which takes any two decoders of the same time, and lets us try them both
So now that we know what we need, how do we get it for our RowDecoder
? Try it out:
The function empty
always fails to parse anything it is given. This makes sense because with no information about the type we want to decode things into, what else could it do?
On the other hand, combineK
says “try the first parsing strategy, and — if that fails — try the second!” In other words, combineK
lets us prioritize different possible ways of parsing the same kind of value so that we can make choices between different possible suitable inputs.
In order to make our code a little more intimidating (and standard!), we can define some operators:
Great, so now let’s see it in action:
Nice. Now we have error handling built into our parser notion. We can account for any collection of variations!
Let’s take that literally and make this easier on ourselves, because it’s going to get annoying if we have to write combine
and <+>
over and over for longer ways of combining things:
Even better. And clearly we could do the same thing with pretty much any collection. Now we can encode relatively complex handling fairly straightforwardly inside this parser layer with power very near to that of boolean algebra. AND and OR go a pretty long way. After all, that’s pretty much all you get with computers!
Coda
Thanks for reading. I hope this trial of parsing or decoding with applicative style (and it’s fancier extensions) was worth your time. If your interest was piqued and you missed part one, you can find it here.
If you like this stuff, Earnest is hiring and we’d love to talk!
If you want to read more about these topics, here are some classic resources on the use of applicative style in decoding and deserializing:
- A Tale on Semirings by Luka Jacobowitz
- Deciphering Haskell’s Applicative and Monadic Parsers by Eli Bendersky
- Applicative Parsing: Building the Foundation by Monday Morning Haskell
- Atto by tpolecat
- Circe by Travis Brown
- Validated from Cats
And, of course, you can always reach out to me @Slazasaurus on Twitter. Thanks for reading!