Using Applicative Style to Increase Compositionality — Part I

Stephen Lazaro
Engineering at Earnest
6 min readOct 10, 2018
Photo by Steve Johnson on Unsplash

As developers, especially back end developers, we often find ourselves writing code that will need to be maintained and revisited for months or even years. We’d like to have clear, easy to rework code. It’s not fun to revisit code and find we have no idea how to change it!

Each component should have a clear and single responsibility. It should also be easy to extend and integrate with new functionality. In this blog post and its sequel, we examine functional programming’s applicative style that we employ here at Earnest to increase compositionality allowing us to simplify responsibilities, and provide greater extensibility.

In this post we’ll work in JavaScript and in the next we’ll work in Scala, both languages in use at Earnest, to emphasize that applicative style is portable between languages.

Study One: Report Aggregation

Let’s suppose that we want to produce one or many summary statistics from a report. The report is a series of record objects with some fixed schema (it’s not particularly important what it is for this purpose). Let’s assume that the report is an iterator or streaming interface for now. It might not all fit in memory at once, and it may not be possible to re-run. We need to consume the report in one go.

The first approach that occurs to most people is to mutate some variables defined outside of the iteration loop, taking a generally imperative approach:

Mutably produce aggregates out of band

There’s certainly nothing wrong with this code from a strictly technical standpoint. It’s clear, it works, and it only consumes the iterator once to produce these aggregates — so it’s largely efficient.

On the other hand, there are some possible rough edges here. First, getAggregates is responsible for calculating multiple aggregates “in parallel”, but each of these aggregates is calculated in the same loop. If we wanted to reuse just one of those calculations elsewhere, we would pretty much have to write it again. Making our aggregates happens in the same space as traversing the large report so we don’t have as nice a separation of responsibilities as we might like.

More importantly, partly because of the conflation of responsibilities, this code is unnecessarily difficult to extend in a maintainable fashion. Imagine that rather than just two aggregates, we needed to calculate 25.

Things… get a little out of hand.

This is not even to raise the specter of a possibly unbounded number of aggregates that we don’t know beforehand! This function could grow quite large and become pretty intimidating looking.

How can we improve this? Well, there is a lot in common with each of these aggregates, so maybe we need to pull them apart and understand their parts.

To start, let’s isolate one aggregate in its own function and try to understand its parts:

Single aggregate via mutation

Now the ceremony around names and variables sticks out like a sore thumb — we don’t need any of that! We can refactor this to be immutable and declaration free using a reduce:

No names!

At the risk of seeming overly fussy, we can separate yet another responsibility. Namely, traversing the report and building the aggregate.

A single aggregate all cleaned up.

Overall, this code is pretty nice looking! We’ve broken each responsibility into its own component, but it doesn’t solve our initial problem. After all, we wanted to build multiple aggregates in a single pass. Right now it looks like we’re only further from our goal.

We could go back to the loop approach by applying our functions like sumBalance, but then we’re stuck mixing iteration with calculation. We still have to know every aggregate beforehand. Not much of an improvement!

Let’s take a step back. Now, it looks like to build an aggregate we need two things:
1. An initial value to start from.
2. A way to combine each record into the prior calculation.

We need different initial values so we can try to calculate things like products, or more sophisticated statistical functions.

You may recognize that we’re talking about monoids, but we don’t need to know anything about that to structure this code.

Let’s encode this abstraction:

Creating a notion for aggregates!

Great, we have a notion for aggregations, so we have an interface or shape to code against. That’ll make it a lot easier to write code that works on any aggregation.

We want to combine those aggregate objects. How can we do this? Well, it turns out there’s only one way to do it in general:

Combining two aggregates and running them in one pass.

Alright, cool! We can combine two aggregates and have our clean distribution of responsibilities. If you’re versed in functional programming lingo, you might notice that we’re talking about something a lot like semigroupal functors here.

We’re still not quite there though. Two is good, but can we do unbounded numbers of aggregates? It turns out we can, we just have to lean on arrays using the exact same machinery!

Combining arbitrary aggregates in an aggregation!

This trick is known as using a traversable functor (the array in this case) in common functional programming parlance, but again we don’t need to know that to use the design.

Notice that it’s now trivial to extend this with more aggregates, we just write a new aggregation and add it to the array of aggregates to apply. Even better, in principle, we can accept any array of aggregations and apply them. If we wanted, we could now dynamically determine at runtime what aggregations we want to apply and just pass them in.

The final version of our (now relatively abstract and generic) machinery then looks like:

Final result. Not bad!

Pretty easy to use! This is fairly clear and much more extensible. Let’s pat ourselves on the back.

On the other hand, we should be upfront about the costs, since there is no free lunch. This version of combineAggregationsReduction builds quite a few intermediate arrays due to its use of the spread operator. At scale, this could induce some non-trivial memory pressure! A possibly more efficient approach would be to allow local mutation and use Array.prototype.concat to combine results. Another approach would be to lean on a utility library like ramda that has an efficient zipWith operator (or implement it yourself) and rework the approach a bit. There’s a lot of wiggle room in the design space here, but it is something to be aware of with regards to the relatively naive version presented here.

From a readability perspective, the arrays might not be the ideal return type, but they do have some advantages. While we lack the context of keys, and the general readability that follows, it’s impossible to assume anything about the schema now (it’s just some array), which is useful when dynamically determining the aggregates to be run. That’s not ideal for all use cases! That said, it is absolutely possible to adapt this machinery to return objects, by enriching our aggregation objects with a key that each aggregate saves its state under. I encourage the interested reader to give it a try (it’s much easier if you allow yourself to use a utilities library to manage merging objects)!

Coda

Hopefully, this study made the meaning of the applicative style a little more clear and was an interesting study in some strategies to write composable and functionally styled code. If you’re hungry for more, keep an eye out for the sequel!

If this kind of stuff interests you, Earnest is hiring and we’d love to talk!

Here are some classic resources on applicatives and especially their use in building aggregates and reports:

And, of course, you can always reach out to me @Slazasaurus on Twitter. Thanks for reading!

--

--