Using Applicative Style to Increase Compositionality — Part II

Stephen Lazaro
Engineering at Earnest
6 min readNov 14, 2018
Photo by Tomas Sobek on Unsplash

In the first part of this series we looked at how to compose different reductions over a single report to efficiently calculate multiple summaries of a report in a single pass. In this blog post we continue with the same main strategy, this time articulating a very explicit applicative style to pull structured data from a relatively flat format. In this way, we achieve flexible, maintainable, and composable code to wrap a fairly traditional Java library API.

Study Two: Composable Excel Sheet Processing

Let’s suppose we want — for whatever reason — to parse an Excel spreadsheet. On the JVM there are options of libraries that provide an API allowing us to access the contents of Excel spreadsheets but many of them are, we might say, venerable.

Their APIs are traditional, by and large; sometimes they initialize a fair amount of state, and they handle errors via exceptions. It can be painful to integrate such code into a more functional code base without some effort. Using Scala’s type system we can apply similar ideas as our previous post in a rigorous way to get a flexible and safe API.

Let’s approach this study in a similar fashion as to our last one. Here’s some example code that is meant to parse a single row using Apache POI, one example of an Excel integration library for the JVM:

Dead simple Excel sheet parsing

We notice immediately that, just like our last study, this code certainly works, but it manages to combine walking the iterator with other concerns. More importantly, it has some other troubles. This code is wildly unsafe, in the sense that getSheet, getCell, toInt, and toDouble can all throw exceptions! Since functional programs don’t make a habit of catching and handling exceptions with the traditional machinery, this leads to one bad record knocking over our entire application. Thankfully, in Scala we have tools for handling these issues.

Let’s repeat our procedure from the last post of focusing in closely on one element and attempting to generalize from there.

Here’s a function that gets the ID field from a row:

Parsing a single cell from a row

As noted before, this is both unsafe and not obviously composable.

We could do this, but it has a few downsides:

Obviously, we try to catch errors.

It’s not quite clear how this is meant to compose with other decoders, and we have to force some value to model failure. The type of this function also does not reflect it’s error handling! We could use a type like Option or Either and manually return from the branches, but we can do something more idiomatic.

In order to handle the errors, we can use the Try abstraction that captures errors and then convert to Either so we can access those errors:

Catch errors in a data structure.

Looks good. Entirely safe for our program. But again, how will we combine these? Afterall, we’ll want to get more than one cell from each row!

Let’s take a step back. We need a function of type Row => Either[Throwable, A] in order to parse a value of type A from a row.

Let’s capture that in an abstract notion, same as we did in the last post:

We capture the notion of a decoder in a data type.

Now we have a named notion of what it is to decode something out of a row of an Excel spreadsheet.

What we want is a function with the following signature:

We want to combine decoders.

If you’re familiar with functional programming, it immediately occurs to you that we’re talking about monoidal functors again (or applicative functors).

Let’s look at what we need to implement to get a monoidal functor for our RowDecoder type:

Monoidal functors.
  1. A way to make a decoder that just always returns the value we ask of it (pure).
  2. A way to “post process” the result of decoding something from a row (map).
  3. A way to combine two decoders into one decoder that returns both values (product).

How can we do that? Well, this will work:

Combining decoders, finally!

Now we can use it to combine our decoders! Monoidal let’s us define some goodies. If the definitions are opaque, do not worry, they are generally provided in libraries and one just pays attention to what they do:

Various post-processing idioms for decoders.

What do these functions mean in terms of our use case?
map2 let's us process the output of two different decoders, failing if either does.
map3 let’s us process the output of three different decoders after all of them produce, failing only when any one of them does.

Note that if a bad row is passed in it fails on the first decoding step to go wrong, and returns that error, going from left to right essentially. We could collect all errors using slightly different data structures, but for now let’s just sit with keeping only the first thing that goes wrong.

But now we have exactly what we needed in order to parse each row into our record type! Let’s see what this new API looks like:

Parsing the sheet with our new machinery

Et voila! We have ourselves a clean, compositional, and safe procedure for pulling data from an Excel spreadsheet with very clear division of responsibilities. We’ve already gone a nice distance towards maintainable and safe code!

On the other hand, things always change. Imagine our Project Manager passes us the new requirement that we need to parse nextPaymentDate as well, so we enrich our models:

Loans, now with payment dates!

No problem, right? Now we just make a map4 following the same pattern as map2 and map3. But wait, looking at the file we realize that there are multiple date formats for the next payment date field! Both “yyyy/mm/dd” and “yyyy-mm-dd” at the very least.

Oh well, it’s the rare set of documents that never varies from a fixed schema. We don’t yet have any way to combine our decoders to capture choices between multiple possible strategies or shapes. Are we out of luck?

Let’s contemplate the problem in front of us:

The problem of choice.

We need a way to say “or” for decoders. How hard can it be?

It turns out if we extend the notions we’ve been working with a bit, we have just enough power to do this.

A slightly stronger strategy!

We have two new functions to write:

  • empty, which takes any type and gives us a decoder of that type
  • combineK, which takes any two decoders of the same time, and lets us try them both

So now that we know what we need, how do we get it for our RowDecoder? Try it out:

Our new powers for decoders. The comments are literal.

The function empty always fails to parse anything it is given. This makes sense because with no information about the type we want to decode things into, what else could it do?

On the other hand, combineK says “try the first parsing strategy, and — if that fails — try the second!” In other words, combineK lets us prioritize different possible ways of parsing the same kind of value so that we can make choices between different possible suitable inputs.

In order to make our code a little more intimidating (and standard!), we can define some operators:

Adding some cool syntax cause we’re cool.

Great, so now let’s see it in action:

Combining two strategies of date parsing.

Nice. Now we have error handling built into our parser notion. We can account for any collection of variations!

Let’s take that literally and make this easier on ourselves, because it’s going to get annoying if we have to write combine and <+> over and over for longer ways of combining things:

Finally using our multiple decoding strategies.

Even better. And clearly we could do the same thing with pretty much any collection. Now we can encode relatively complex handling fairly straightforwardly inside this parser layer with power very near to that of boolean algebra. AND and OR go a pretty long way. After all, that’s pretty much all you get with computers!

Coda

Thanks for reading. I hope this trial of parsing or decoding with applicative style (and it’s fancier extensions) was worth your time. If your interest was piqued and you missed part one, you can find it here.

If you like this stuff, Earnest is hiring and we’d love to talk!

If you want to read more about these topics, here are some classic resources on the use of applicative style in decoding and deserializing:

And, of course, you can always reach out to me @Slazasaurus on Twitter. Thanks for reading!

--

--