How to `traverse’ sequentially

Using free monads to control the execution of asynchronous tasks in Scala

This blog post will assume terminology from the functional scala library Cats. It is easily applicable to Scalaz, too.

Libraries like Cats and Scalaz give you a verb ‘traverse’, for easier handling of combinations of wrapper types.

It allows you to turn something clunky like this:

into this:

That is, it does the map and the sequence for you. It applies the function to the collection, and pulls the wrapper type through to the outside.

Of course it’s not limited to Future and Seq. The ‘collection’ type can be anything that is “traversable” (lists, options, matrices, tuples, trys — it’s a supertype of foldable). The inner type can be any “applicative” (future/task/IO, matrices, validated/either/disjunction, etc).

A problem arises however when your ‘collection’ is large, and your function is resource intensive. For example connecting to a database, performing a query, and disconnecting. If your collection had 100,000 elements in it, this would hammer the DB into submission and almost certainly throw an exception somewhere as every single element would try to connect at the same time (up to various pooling limitations).

The problem boils down to management of side-effects. If we know the side-effect is resource intensive, and we know the collection might be large, we can predict this problem. Ideally we’d like to be able to swap out traverse with a very similar method that guarantees the elements will be executed sequentially rather than in parallel. This would allow our fragile external services a much easier time of it.

Via batching, it would also allow us to manage concurrent access and keep it reasonably fast.

We could call this method sequentialTraverse. It would have (nearly) the same signature as traverse, because all it’s doing is executing side effects in a different way. It would do exactly the same thing to the types at compile time.

I say it would have “nearly” the same signature as traverse, because what we want to do — ensure sequentiality — is inherently a property of a Monad. Our ‘inner’ type will have this stricter condition of sequentiality and so must be a Monad rather than an Applicative. If there is no way to enforce the sequentiality of a series of statements (ie an asynchronous Applicative with no Monad instance) then there certainly is no way to sequentially traverse a structure with it.

Naive approach

Well, we can jump right in and have something that works but is pretty naive:

We’ve dived right in and just implemented a traverse that must be sequential, because we use flatMap inside the sequence’s foldLeft.

This works, but obviously is pretty weak since it works only on Seq and Future. It wouldn’t work on Task or IO, or even a different collection like Matrix. Ideally, we would want to be able to use sequentialTraverse in the case where we were working off an arbitrary monad rather than a fixed Future, just in case it might be an asynchronous monad.

Slightly less naive approach

I mentioned above that traverse works on any collection which has a Traverse instance and any wrapper type that has an Applicative instance. But we apparently can’t use the traverse method, since there’s no guarantee it’s sequential. So how would we generalise the above naive attempt?

We fold the sequence: this means we need a Foldable. We flatMap the future: this means we need a Monad. And lastly we add an element to the sequence: this means we need a Monoid.

So:

This is almost verbatim just a generalisation of the naive approach.

And it works! The flatMap ensures sequentiality.

However, there’s a problem.

When we break down a collection using Foldable, there’s no guarantee we do it in the same way that the Monoid builds it back up again.

For example, the normal Foldable[Seq] uses Seq.foldLeft, and the normal Monoid[Seq] uses :+ (append). This would work, since they map onto our original attempt exactly.

But if we had a custom Monoid[Seq], which built a sequence using prepend rather than append, then our resulting traversed sequence would be in reverse order!

So this approach is no good either, there’s no guarantee it will work in all cases. There’s no connection at compile time between the way we fold a structure and the way we build the same structure back up.

A different approach

We need something that guarantees the breaking-down and building-up are inverses of each other.

Unfortunately, this is the Traverse typeclass itself. We seem to have come full circle; in order to traverse predictably in all situations we need to use Traverse. This appeared to be a no-go because it would greedily execute our asynchronous monads. But all is not lost yet — let’s have a look at the components we have:

We know from our above explorations that traverse has to stay. So what else has changed? The only thing different from the basic traverse use-case is our M is a monad, not an applicative.

Doesn’t a monad encode sequentiality? Why can’t we just do s.traverse(f)? Won’t this be sequential because M is a monad?

Unfortunately not, because its underlying Applicative instance (all monads are applicatives) could still be parallel; there’s no guarantee it uses flatMap and therefore be sequential.

You could by all means override the underlying Applicative methods to force them to be sequential, but that is not a good user experience. Picking a Monad and hoping it’s correct or risk having your code explode. We would rather force the code to work the way we need it to with no choice to be made.

So the only thing we can do is ensure that when we traverse our s, we ‘trick’ it into not actually executing anything — ie make traverse lazy -
and then to somehow execute the result ourselves in a way that ensures sequentiality. That way, the default behaviour of traverse is fine and does not interfere with any potentially external resources.

As our s gets broken down by traverse, our new lazy construct, whatever it is, would get accumulated — and then we would take our construct apart in a sequential way which we manage ourselves.

This new object would be a series of lazy instructions built by traverse which
we execute afterwards.

Free Monad

This sounds like a free monad, which is from a certain point of view a way of encoding pure programming instructions free from any execution context. Once you have a free monad, you can execute it with any context you like.

This sounds like exactly what we need.

What is a free monad? Loosely, it’s a way of turning a Functor into a Monad, by ignoring nesting during the flatMap operation. The specifics aren’t too important: What we need is that we can get a Monad from a Functor (for free!)

So let’s quickly recap what we’re working with:

One thing immediately jumps out as being a candidate for a free monad: M. It’s already a Monad, but that means it’s also a Functor. But there’s actually another option: f, our function we wish to traverse with.

Function1[X, Y] is a Functor in the second argument. Could we use this? Nearly. Remember we want this to be lazy once we apply it, and a function is anything but. As soon as it’s applied, it’s… well, applied.

Let’s invent something quickly:

This is a functor in X — it just delegates to Function1s instance and keeps a fixed. This type allows us to combine a function with its argument and not execute immediately. Sounds very promising.

Notice that I’ve included an M in the definition. This is purely because we are always working with a function that produces a monad in this blog post.
There’s nothing special about the M here at the moment. The class will be defined inside a context where we have M so it needn’t be on the class definition.

Now let’s create a free monad from LazyFunction:

FreeLazyFunction is an alias to our Free type, to make the type argument lists a little nicer. lift elevates a function to our Free monad.

And… that’s all we need. FreeLazyFunction is a Monad already. So let’s use it:

and we have it! We have built our series of instructions using traverse, and we haven’t yet executed anything.

We just need a way to collapse our free monad now. This is done via a mapping between our inner Free type and a Monad
in this case M. This mapping is called a natural transformation and looks like this:

It’s just finally applying the LazyFunction, and executing the underlying function/argument pairs. Now we can run our free monad:

The foldMap guarantees that each step of our free monad is only executed (using LazyFunction.apply) once the previous step is finished, using M.flatMap.

This is precisely what we wanted, forced sequential execution inside traverse with no extra overhead for the end developer!

The only thing left is the syntax:

You can have a quick test that this works as expected:

sequentialTraverse is minimal — it requires only a Traverse and a Monad instance, and is usable without any extra cognitive overhead or effort from future developers. It ensures sequentiality in the case of an asynchronous monad, and behaves the same as normal traverse in the case where we have a synchronous monad, making it safe for a developer to accidentally misuse, too.

Like what you read? Give James Phillips a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.